Identify pseudotime-dependent genes using Random Forest (RF) Model. Given Seurat object and highly variable genes (hvg), expression data are split into training and test set, and random forest model is trained to identify genes that vary with pseudotime. Model performance is evaluated on test set. Random forest model is fit using R 'parsnip' package.

pseudotimeRF(
  so,
  hvg,
  pseudotimes,
  lineage.name,
  slot = "data",
  assay = DefaultAssay(so),
  mtry = length(hvg)/10,
  trees = 1000,
  min_n = 15,
  mode = "regression",
  importance = "impurity",
  num.threads = 3
)

Arguments

so

Seurat Object

hvg

Genes used to fit model (character vector; must be available in rows of seurat object). It is suggested to keep number of genes low (~200) for optimal performance.

pseudotimes

Numeric vector of pseudotimes. Length must be equal to number of cells in seurat object (ncol(so)).

lineage.name

Name of pseudotime lineage; used to label results.

slot

A character specifying which slot to pull data from; default is 'Data'

assay

A character specifying which assay to use (e.g., 'RNA' or 'SCT'). If unspecified, set to DefaultAssay(so)

mtry

An integer for the number of predictors that will be randomly sampled at each split when creating the tree models.

trees

An integer for the number of trees contained in the ensemble.

min_n

An integer for the minimum number of data points in a node that are required for the node to be split further.

mode

Specfiy type of RF to fit: 'regression' or 'classification'. Regression is default and it is not recommended to change this argument.

importance

Type of importance. Default is 'impurity'.

num.threads

An integer for the number of threads to use when fitting RF model

Value

List of results

See also

rand_forest