Extended version of original prep Seurat function. In addition to running standard seurat object checks, prepSeurat2 ensures correct species is specified, sets the cluster resolution and subsets and downsamples data. If data are subset, then expression values are normalized and (optionally) scaled again.

prepSeurat2(
  object,
  e2s = NULL,
  species = NULL,
  resolution = NULL,
  subset.data = NULL,
  subsample = 1,
  terms2drop = NULL,
  rmv.pattern = NULL,
  reprocess.n.var = 3000,
  neighbors.reprocessed = F,
  scale.reprocessed = F,
  use.integrated = T,
  keep.default.assay.only = F,
  coerce.assay.used.to.default = T,
  barcode.recode = NULL
)

Arguments

object

Seurat objects

e2s

ensemble to gene symbol mapping vector. Is a named character vector where names are ENSEMBL and entries are SYMBOLs.

species

Species. One of "Mm" or "Hs".

resolution

cluster resolution. Numeric [0,Inf] that specifies resolution for data clustering. If requested resolution exists, no new clustering is performed.

subset.data

Data.frame specifying how to subset seurat object. Data.frame must contain two columns: 'field' and 'subgroups'. 'field' column specifies which meta.data field to subset on, and 'subgroups' column specifies which subgroup to include within specified field. See scMiko::subsetSeurat() for details.

subsample

Numeric [0,1] specifying what fraction of cells to include for analysis. Default is 1. See scMiko::downsampleSeurat() for details.

terms2drop

Reduce memory footprint of seurat object by omitting terms that will not be used for current analysis. Supported terms for omission include: "pca", "umap", "ica", "tsne", "nmf", "corr", "gsva", "deg", "counts", "data", "scale", "rna", "sct", "integrated", "graphs", "integration.anchors".

rmv.pattern

Provided as input into scMiko::clearGlobalEnv(pattern = rmv.pattern). Character specifying name of variables to remove from global environment. Useful if object is large.

reprocess.n.var

Number of variable genes to use if data is reprocessed. Default is 3000. Note that if integrated assay is available, variable features are first identified for reprocessed data set, and subsequently merged with the variable features present in the integrated assay, thus allows for potentially more variable features than specified by this parameter.

neighbors.reprocessed

Specifies whether to compute new neighborhood graph if graphs are missing. Note that if data are subset, graphs are inherently removed. If downstream clustering is anticipated, set as TRUE. Default is FALSE.

scale.reprocessed

if reprocessing data (i.e., normalizing), specify whether scaling should also be performed. Default is FALSE.

use.integrated

If TRUE, sets default assay to "integrated" if present within seurat object.

keep.default.assay.only

Specify whether to omit assays that are not default. Default is FALSE.

coerce.assay.used.to.default

Specify whether to coerce assay used to default assay. Necessary if omitting assays (e.g., integrated). Default is TRUE.

barcode.recode

List specifying how to recode barcodes. Default is NULL. See recodeBarcode() for details.

Value

list containing prepped Seurat object, default assay, and number of cells in seurat object.

Author

Nicholas Mikolajewicz