Runs scale-free shared nearest neighbor network (SNN) analysis on subset of features specified in Seurat object.

runSSN(
  object,
  features,
  scale_free = T,
  robust_pca = F,
  data_type = c("pearson", "deviance"),
  reprocess_sct = F,
  slot = c("scale", "data"),
  batch_feature = NULL,
  do_scale = F,
  do_center = F,
  pca_var_explained = 0.9,
  weight_by_var = F,
  umap_knn = 10,
  optimize_resolution = T,
  target_purity = 0.8,
  step_size = 0.05,
  n_workers = 1,
  verbose = T
)

Arguments

object

Seurat object

features

features to compute SNN on. If features are missing from scaled data, scaled data is recomputed.

scale_free

Logical to enforce scale free topology. Default is T.

robust_pca

Logical to run robust PCA (WARNING: computationally intensive, not recommended for large data). Default is F.

data_type

Data type to compute SNN on.

  • "pearson" - pearson residuals for count data based on regularized negative binomial model.

  • "deviance" - deviance for count data based on multinomial null model (assumes each feature has constant rate).

reprocess_sct

if `data_type` is "pearson", specify whether SCTransform is run (regardless whether features missing from existing scaled data or not). Default is F.

slot

Slot to use.

  • "scale" - RECOMMENDED (default)

  • "data" - Not recommended and not tested extensively. Available for exploration. If specified, `data_type` is ignored.

batch_feature

Variables to regress out. Default is NULL.

do_scale

Whether to scale data (only if `slot` = "data")

do_center

Whether to center data (only if `slot` = "data")

pca_var_explained

Proportion of variance explained by PCA. Uses that top N PC components that explain `pca_var_explained` amount of variance. Default is 0.9.

weight_by_var

Weight the feature embedding by the variance of each PC

umap_knn

This determines the number of neighboring points used in local approximations of UMAP manifold structure. Larger values will result in more global structure being preserved at the loss of detailed local structure. In general this parameter should often be in the range 5 to 50. default is 10.

optimize_resolution

Logical specifying whether to identify optimal clustering resolution. Optimal resolution identifying use target purity criteria. Default is T.

target_purity

Target purity for identifying optimal cluster resolution. Default is 0.8.

step_size

Step size between consecutive resolutions to test. Default is 0.05.

n_workers

Number of workers for parallel implementation. Default is 1.

verbose

Print progress. Default is T.

Value

Cell x Gene Seurat object, with gene-centric UMAP embedding and associated gene programs

See also

findNetworkFeatures for finding features, SCTransform for gene count normalization and scaling, nullResiduals for deviance calculations, scaleFreeNet for scale-free topology transform.

Author

Nicholas Mikolajewicz