Identify pseudotime-dependent genes using Random Forest (RF) Model

Identify pseudotime-dependent genes using Random Forest (RF) Model. Given Seurat object and highly variable genes (hvg), expression data are split into training and test set, and random forest model is trained to identify genes that vary with pseudotime. Model performance is evaluated on test set. Random forest model is fit using R 'parsnip' package.

pseudotimeRF(
  so,
  hvg,
  pseudotimes,
  lineage.name,
  slot = "data",
  assay = DefaultAssay(so),
  mtry = length(hvg)/10,
  trees = 1000,
  min_n = 15,
  mode = "regression",
  importance = "impurity",
  num.threads = 3
)

Arguments

so: Seurat Object
hvg: Genes used to fit model (character vector; must be available in rows of seurat object). It is suggested to keep number of genes low (~200) for optimal performance.
pseudotimes: Numeric vector of pseudotimes. Length must be equal to number of cells in seurat object (ncol(so)).
lineage.name: Name of pseudotime lineage; used to label results.
slot: A character specifying which slot to pull data from; default is 'Data'
assay: A character specifying which assay to use (e.g., 'RNA' or 'SCT'). If unspecified, set to DefaultAssay(so)
mtry: An integer for the number of predictors that will be randomly sampled at each split when creating the tree models.
trees: An integer for the number of trees contained in the ensemble.
min_n: An integer for the minimum number of data points in a node that are required for the node to be split further.
mode: Specfiy type of RF to fit: 'regression' or 'classification'. Regression is default and it is not recommended to change this argument.
importance: Type of importance. Default is 'impurity'.
num.threads: An integer for the number of threads to use when fitting RF model

Value

List of results

Identify pseudotime-dependent genes using Random Forest (RF) Model

Arguments

Value

See also