Description | Variable Name | Value |
---|---|---|
Module | Preprocessing & QC | |
Date | Sys.time() | 2022-01-24 15:40:13 |
Input Expression Matrix | input_data | C:/Users/Owner/Dropbox/PDF Projects - JM/R Packages/scMiko/data/zeisel_count_matrix.rds |
gene/cell Upper Limit (N/cell) | gene.upperlimit | 9000 |
gene/cell Lower Limit (N/cell) | gene.lowerlimit | 200 |
Filtered by unmatched reads | unmatched.rate.filter.flag | TRUE |
Mitochondrial Content Upper Limit (%/cell) | mt.upperlimit | 10 |
Data Subsampled | subsampled | FALSE |
Cluster Resolution | cluster.resolution | 1 |
Filtered by species | species.filter.flag | FALSE |
Number of Workers | normScale | 1 |
Variables regressed out | vars2regress | percent.mt |
Input Species | all_input_organisms | |
Feature selection method | feature.select.method | hvg |
PCA component selection method | pca.component.select.method | cum_var |
Normalization Method | norm_method | SCT |
PCA Method | pca.method | pca |
Principal Components Included | nDim | 33 |
Unmatched rate upper limit | unmatch.high | 1 |
Unmatched rate lower limit | unmatch.low | 0 |
nPCA components used | nDim | 33 |
Run Time (s) | elapsed.time | 157.9 |
Run Identifier | run.id | output_34245PM |
Scaling method | scale.method | sct |
Output Saved | save.flag | TRUE |
Description | Variable Name | Value |
---|---|---|
Module | Cluster Optimization | |
Date | Sys.time() | 2022-01-24 16:31:13 |
Input File | input.file | C:/Users/Owner/Dropbox/PDF Projects - JM/Data/scRNA-seq/01_sci-RNA-seq3_Hong_Kevin_Jason/NM_HH/Data/Preprocessed_Datasets/output_34245PM_demo_zeisel_10000cell.Rdata |
Group singletons when clustering | group.singletons | FALSE |
Log FC Threshold | lfc.threshold | 0.5 |
p value Threshold | fdr.threshold | 0.05 |
Positive Markers Only | only.pos | TRUE |
Cluster Resolution | cluster.resolution | 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 3 |
Subsample Factor | subsample_factor | 1 |
Subsample N | subsample_n | 10000 |
Figures Printed in Notebook | print.inline | FALSE |
PDF saved | save.pdf | FALSE |
Run Time (s) | elapsed.time | 569.69 |
Run Identifier | run.id | output_43915PM |
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale: LC_COLLATE=English_Canada.1252, LC_CTYPE=English_Canada.1252, LC_MONETARY=English_Canada.1252, LC_NUMERIC=C and LC_TIME=English_Canada.1252
attached base packages: parallel, grid, stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: presto(v.1.0.0), data.table(v.1.14.0), Rcpp(v.1.0.7), igraph(v.1.2.6), doParallel(v.1.0.16), iterators(v.1.0.13), foreach(v.1.5.1), cluster(v.2.1.0), future(v.1.22.1), zoo(v.1.8-9), ggExtra(v.0.9), ggpmisc(v.0.4.3), ggpp(v.0.4.2), DT(v.0.19), gridExtra(v.2.3), RColorBrewer(v.1.1-2), reshape2(v.1.4.4), plyr(v.1.8.6), scMiko(v.0.1.0), flexdashboard(v.0.5.2), tidyr(v.1.1.3), SeuratObject(v.4.0.2), Seurat(v.4.0.4), dplyr(v.1.0.7) and ggplot2(v.3.3.5)
loaded via a namespace (and not attached): readxl(v.1.3.1), backports(v.1.2.1), lazyeval(v.0.2.2), splines(v.4.0.3), crosstalk(v.1.1.1), listenv(v.0.8.0), scattermore(v.0.7), digest(v.0.6.27), yulab.utils(v.0.0.2), htmltools(v.0.5.2), viridis(v.0.6.2), fansi(v.0.5.0), magrittr(v.2.0.1), tensor(v.1.5), ROCR(v.1.0-11), openxlsx(v.4.2.4), globals(v.0.14.0), matrixStats(v.0.61.0), spatstat.sparse(v.2.0-0), colorspace(v.2.0-2), ggrepel(v.0.9.1), haven(v.2.4.3), xfun(v.0.26), crayon(v.1.4.1), jsonlite(v.1.7.2), spatstat.data(v.2.1-0), survival(v.3.2-13), glue(v.1.4.2), polyclip(v.1.10-0), gtable(v.0.3.0), MatrixModels(v.0.5-0), leiden(v.0.3.9), car(v.3.0-11), future.apply(v.1.8.1), abind(v.1.4-5), SparseM(v.1.81), scales(v.1.1.1), pheatmap(v.1.0.12), DBI(v.1.1.1), ggthemes(v.4.2.4), rstatix(v.0.7.0), miniUI(v.0.1.1.1), viridisLite(v.0.4.0), xtable(v.1.8-4), gridGraphics(v.0.5-1), reticulate(v.1.20), spatstat.core(v.2.3-0), foreign(v.0.8-80), htmlwidgets(v.1.5.4), httr(v.1.4.2), ellipsis(v.0.3.2), factoextra(v.1.0.7), ica(v.1.0-2), pkgconfig(v.2.0.3), farver(v.2.1.0), sass(v.0.4.0), uwot(v.0.1.10), deldir(v.0.2-10), utf8(v.1.2.2), ggplotify(v.0.1.0), tidyselect(v.1.1.1), labeling(v.0.4.2), rlang(v.0.4.11), later(v.1.3.0), cellranger(v.1.1.0), munsell(v.0.5.0), tools(v.4.0.3), generics(v.0.1.0), broom(v.0.7.9), ggridges(v.0.5.3), evaluate(v.0.14), stringr(v.1.4.0), fastmap(v.1.1.0), yaml(v.2.2.1), goftest(v.1.2-2), knitr(v.1.36), fs(v.1.5.0), fitdistrplus(v.1.1-5), pander(v.0.6.4), zip(v.2.2.0), purrr(v.0.3.4), RANN(v.2.6.1), pbapply(v.1.5-0), nlme(v.3.1-149), mime(v.0.11), quantreg(v.5.86), compiler(v.4.0.3), curl(v.4.3.2), plotly(v.4.9.4.1), png(v.0.1-7), ggsignif(v.0.6.3), spatstat.utils(v.2.2-0), tibble(v.3.1.4), bslib(v.0.3.0), stringi(v.1.7.4), highr(v.0.9), forcats(v.0.5.1), lattice(v.0.20-41), Matrix(v.1.3-4), vctrs(v.0.3.8), pillar(v.1.6.4), lifecycle(v.1.0.1), spatstat.geom(v.2.2-2), lmtest(v.0.9-38), jquerylib(v.0.1.4), RcppAnnoy(v.0.0.19), cowplot(v.1.1.1), irlba(v.2.3.3), conquer(v.1.0.2), httpuv(v.1.6.3), patchwork(v.1.1.1), R6(v.2.5.1), promises(v.1.2.0.1), rio(v.0.5.27), KernSmooth(v.2.23-17), parallelly(v.1.28.1), codetools(v.0.2-16), MASS(v.7.3-53), assertthat(v.0.2.1), withr(v.2.4.2), sctransform(v.0.3.2), hms(v.1.1.1), mgcv(v.1.8-33), rpart(v.4.1-15), rmarkdown(v.2.11), carData(v.3.0-4), Rtsne(v.0.15), ggpubr(v.0.4.0) and shiny(v.1.6.0)
---
title: "Cluster Optimization"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
flexdashboard::flex_dashboard:
orientation: rows
vertical_layout: fill
source_code: embed
theme:
bootswatch: flatly
navbar:
- { title: "scMiko", href: "https://nmikolajewicz.github.io/scMiko/" }
editor_options:
chunk_output_type: inline
knit: (function(inputFile, encoding) {
rmarkdown::render(input = inputFile,
encoding = encoding,
output_file = if (exists("user")){paste0(
xfun::sans_ext(inputFile), '_',user, "_",
paste0(format(Sys.time(), "%d_%b_"), gsub(" ", "_", gsub(":", "_", format(Sys.time(), "%X"))), format(Sys.time(), "_%Y")), '.html'
)} else {paste0(xfun::sans_ext(inputFile), '_',"Guest", "_",
paste0(format(Sys.time(), "%d_%b_"), gsub(" ", "_", gsub(":", "_", format(Sys.time(), "%X"))), format(Sys.time(), "_%Y")), '.html')},
output_dir = if (exists("data.path")) paste0(data.path, "/HTML_Reports") else NULL
)
})
---
```{r setup, include=FALSE}
# clear global enviroment
rm(list = setdiff(ls(), c("data.path", "user")))
invisible({gc()})
# initiate timer
start.time <- proc.time()
# List of packages to load
packages2load <- c("scMiko", "Seurat",
"plyr", "dplyr", "tidyr", "reshape2", "RColorBrewer", "ggplot2", "gridExtra",
"DT", "flexdashboard", "ggpmisc", "ggpmisc", "ggExtra", "grid",
"zoo", "future", "cluster", "doParallel", "parallel", "foreach", "igraph", "presto")
# load packages
invisible(lapply(packages2load, library, character.only = TRUE))
```
```{r parameter specification}
parameter.list <- list(
input.file = "C:/Users/Owner/Dropbox/PDF Projects - JM/Data/scRNA-seq/01_sci-RNA-seq3_Hong_Kevin_Jason/NM_HH/Data/Preprocessed_Datasets/output_34245PM_demo_zeisel_10000cell.Rdata",
cluster.resolution = c(0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 3),
lfc.threshold = 0.5,
fdr.threshold = 0.05,
group.singletons = F,
print.inline = FALSE,
subsample_factor = 1,
subsample_n = 10000,
n.workers =list(
clustering = 4,
deg = 4
),
only.pos = T,
save.pdf = F,
update.log = F,
rprofile.dir = F,
developer = F
)
```
```{r analysis log}
miko_message("Updating analysis log...")
# Module
df.log <- initiateLog("Cluster Optimization")
df.log <- addLogEntry("Input File", parameter.list$input.file, df.log, "input.file")
df.log <- addLogEntry("Group singletons when clustering", parameter.list$group.singletons, df.log, "group.singletons")
df.log <- addLogEntry("Log FC Threshold", parameter.list$lfc.threshold, df.log, "lfc.threshold")
df.log <- addLogEntry("p value Threshold", parameter.list$fdr.threshold, df.log, "fdr.threshold")
df.log <- addLogEntry("Positive Markers Only", parameter.list$only.pos, df.log, "only.pos")
df.log <- addLogEntry("Cluster Resolution", parameter.list$cluster.resolution, df.log, "cluster.resolution")
df.log <- addLogEntry("Subsample Factor", parameter.list$subsample_factor, df.log, "subsample_factor")
df.log <- addLogEntry("Subsample N", parameter.list$subsample_n, df.log, "subsample_n")
df.log <- addLogEntry("Figures Printed in Notebook", parameter.list$print.inline, df.log, "print.inline")
df.log <- addLogEntry("PDF saved", parameter.list$save.pdf, df.log, "save.pdf")
```
```{r load data, warning = FALSE}
# Specify data directories
if (parameter.list$rprofile.dir){
dir.preprocessed <- "Preprocessed_Datasets/"
if (!exists("data.path") & !exists("user")) {
stop("data.path and user variables do not exist in global enviroment. Ensure that these are specified in .Rprofile. If specified, try restarting Rstudio.")
}
}
miko_message("Importing data...")
if ((!grepl(".Rdata|.RData", parameter.list$input.file)) & !(grepl(".rds", parameter.list$input.file))){
parameter.list$input.file <- paste0(parameter.list$input.file, ".Rdata")
}
if (parameter.list$rprofile.dir){
if (grepl(".Rdata|.RData", parameter.list$input.file)){
load(paste(data.path, dir.preprocessed, parameter.list$input.file, sep = ""));
} else if (grepl(".rds", parameter.list$input.file)) {
so <- readRDS(paste(data.path, dir.preprocessed, parameter.list$input.file, sep = ""))
}
} else {
if (grepl(".Rdata|.RData", parameter.list$input.file)){
load(parameter.list$input.file);
} else if (grepl(".rds", parameter.list$input.file)) {
so <- readRDS(parameter.list$input.file);
}
}
if (!exists("gNames.list")) gNames.list <- prepGeneList(so, objects())
if ("integrated" %in% names(so@assays)){
terms2drop <- c("ica", "tsne", "nmf", "corr", "gsva", "deg", "sct")
} else {
terms2drop <- c("ica", "tsne", "nmf", "corr", "gsva", "deg")
}
# prep seurat object
prep.list <- prepSeurat2(so, e2s = gNames.list,
species = NULL, resolution= NULL, subset = NULL,
subsample = 1,
terms2drop = terms2drop, rmv.pattern = "so",
scale.reprocessed = F,
neighbors.reprocessed = T)
# unpack results
if (exists("so")) try({rm(so)}, silent = T)
so.query <- prep.list$so
current.assay <- prep.list$assay
n.cells <- prep.list$n.cell
rm(prep.list)
invisible({gc()})
parameter.list$species <- detectSpecies(so.query)
```
```{r prior log}
# determine prior log history
cur.env <- objects()
module.logs <- cur.env[grep("^df.log_Module",cur.env)]
```
```{r cluster helper function}
# clustering data
mc.list <- multiCluster(object = so.query,
resolutions = parameter.list$cluster.resolution,
assay = NULL, nworkers = parameter.list$n.workers$clustering ,
pca_var = 0.9,
group_singletons = parameter.list$group.singletons,
algorithm = 1,
return_object = F)
plt.umap_by_cluster <- mc.list$plots
so.query <- mc.list$object
cr_names <- mc.list$resolution_names
cluster.name <- mc.list$cluster_names
assay.pattern <- mc.list$assay_pattern
rm(mc.list); invisible({gc()})
```
```{r calculate cluster purity}
miko_message("Calculating purity scores...")
availableNNgraph <- function(object, verbose = T){
suppressMessages({
suppressWarnings({
graph.present <- F
try({
cluster.graph <- object@commands[["FindClusters"]]@params[["graph.name"]]
cluster.graph <- gsub("_s", "_", cluster.graph)
graph.present <- cluster.graph %in% names(object@graphs)
}, silent = T)
if (!graph.present){
try({
cluster.graph <- paste0(current.assay, "_nn")
graph.present <- cluster.graph %in% names(object@graphs)
}, silent = T)
}
if (!graph.present) stop("Could not find valid KNN graph")
})
})
miko_message("'", cluster.graph, "' is available for neighborhood purity scoring")
return(cluster.graph)
}
multiPurity <- function(object, resolutions, cluster_graph = availableNNgraph(object),
n_subsample = 10000, pca_var = 0.9, assay_pattern = NULL, assay = NULL,
purity_threshold = 0.9){
df.purity <- NULL
purity.umap.list <- list()
if (is.null(assay_pattern) & is.null(assay)){
assay_pattern <- paste0(DefaultAssay(object), "_snn_res.")
} else if (is.null(assay_pattern) & !is.null(assay)){
assay_pattern <- paste0(assay, "_snn_res.")
}
cluster_graph_set <- cluster_graph
# downsample data if exceeding 10000 cells
if (ncol(object) > n_subsample){
so.sub <- object[ , sample(colnames(object), n_subsample)]
df.pc <- propVarPCA(so.sub)
so.sub <- FindNeighbors(so.sub, dims = 1:(sum(df.pc$pc.cum_sum < pca_var) + 1), nn.method = "rann")
}
for (i in 1:length(resolutions)) {
current.cluster <- paste0(assay_pattern, resolutions[i])
if (ncol(object) > n_subsample){
so.sub <- neighborPurity(object = so.sub, graph = cluster_graph_set, cluster.field = current.cluster, verbose = F)
suppressMessages({
suppressWarnings({
purity.umap.list[[current.cluster]] <- FeaturePlot(so.sub, "purity") +
theme_miko(legend = T) +
viridis::scale_color_viridis() +
labs(title = "Purity Score", subtitle = paste0("Resolution: ", resolutions[i])) +
labs(x = "UMAP 1", y = "UMAP 2", color = "Purity")
})
})
df.purity <- bind_rows(df.purity,
data.frame(resolution = as.numeric(resolutions[i]),
name = current.cluster,
cell = colnames(so.sub),
cluster = so.sub@meta.data[ ,current.cluster],
purity = so.sub@meta.data[ ,"purity"]))
} else {
# so.sub <- object; rm(object); invisible({gc()})
object <- neighborPurity(object = object, graph = cluster_graph_set, cluster.field = current.cluster, verbose = F)
suppressMessages({
suppressWarnings({
purity.umap.list[[current.cluster]] <- FeaturePlot(object, "purity") +
theme_miko(legend = T) +
viridis::scale_color_viridis() +
labs(title = "Purity Score", subtitle = paste0("Resolution: ", resolutions[i])) +
labs(x = "UMAP 1", y = "UMAP 2", color = "Purity")
})
})
df.purity <- bind_rows(df.purity,
data.frame(resolution = as.numeric(resolutions[i]),
name = current.cluster,
cell = colnames(object),
cluster = object@meta.data[ ,current.cluster],
purity = object@meta.data[ ,"purity"]))
}
}
df.purity.sum <- df.purity %>%
dplyr::group_by(resolution, cluster) %>%
dplyr::summarize(purity.mean1 = mean(purity, na.rm = T),
purity.median1 = median(purity, na.rm = T),
purity.sd1 = sd(purity, na.rm = T), .groups = 'drop') %>%
dplyr::group_by(resolution) %>%
dplyr::summarize(purity.mean = mean(purity.mean1, na.rm = T),
purity.min = min(purity.mean1),
purity.max = max(purity.mean1),
purity.sd = sd(purity.mean1, na.rm = T), .groups = 'drop')
df.purity.sum$z <- (1 - df.purity.sum$purity.mean)/df.purity.sum$purity.sd
df.purity.sum$z[is.na(df.purity.sum$z)] <- 0
df.purity.sum$p <- pnorm(q=df.purity.sum$z, lower.tail=FALSE)
plt.cluster.purity <- df.purity.sum %>%
ggplot(aes(x = as.numeric(resolution), y = purity.mean)) +
geom_line() +
# scale_color_manual(values = c("TRUE" = "tomato", "FALSE" = "black")) +
geom_point(size = 3) + #, aes(color = p < p_threshold)
geom_hline(yintercept = purity_threshold, linetype = "dashed") +
theme_miko(legend = T) +
theme(legend.position = "bottom") +
labs(x = "Resolution", y = "Purity Score", title = "Purity Score") + #, color = paste0("Impure (p<", p_threshold, ")")
theme(panel.grid.minor = element_line(colour="grey95", size=0.1),
panel.grid.major = element_line(colour="grey85", size=0.1),
panel.grid.minor.y = element_blank(),
panel.grid.major.y = element_blank()) +
scale_x_continuous(minor_breaks = seq(0 , max(df.purity.sum$resolution, na.rm = T), 0.1) ,
breaks = seq(0, max(df.purity.sum$resolution, na.rm = T), 0.2))
return(
list(
purity_raw = df.purity,
purity_summary = df.purity.sum,
resolution_plot = plt.cluster.purity,
umap_plot = purity.umap.list
)
)
}
mp.list <- multiPurity(object = so.query,
resolutions = parameter.list$cluster.resolution,
cluster_graph = availableNNgraph(so.query),
n_subsample = 10000,
pca_var = 0.9,
assay_pattern = assay.pattern, assay = NULL,
purity_threshold = 0.9) #, p_threshold = 0.05
df.purity <- mp.list$purity_raw
df.purity.sum <- mp.list$purity_summary
plt.cluster.purity <- mp.list$resolution_plot
purity.umap.list <- mp.list$umap_plot
rm(mp.list); invisible({gc()})
if (parameter.list$print.inline){
print(plt.cluster.purity)
}
```
```{r subsample data here, warning = FALSE}
try({
if ((parameter.list$subsample_factor < 1) | (parameter.list$subsample_n < ncol(so.query))){
so.query <- downsampleSeurat(object = so.query, subsample.factor = parameter.list$subsample_factor, subsample.n = parameter.list$subsample_n)
}
}, silent = T)
```
```{r check gene rep and convert if necessary, warning = FALSE}
current.assay <- DefaultAssay(so.query)
if (!is.null(gNames.list)){
gene.rep <- checkGeneRep(gNames.list, as.vector(rownames(so.query)))
if (gene.rep == "ensembl"){
so.query <- ens2sym.so(so.query, gNames.list)
}
}
```
```{r find DEG markers, warning=FALSE}
multiDEG <- function(object, groups, nworkers = 1, fdr_threshold = 0.05, logfc_threshold = 0, only_pos = T, verbose = T ){
# initiate list to store cluster markers
query_markers.list <- list()
# store current identities
cur_idents <- Idents(object)
# so.markers <- so.query
# assay for DEG
deg_assay <- c("RNA", "SCT")[which(c("RNA", "SCT") %in% names(object@assays)) ]
if (length(deg_assay) == 0){
deg_assay <- DefaultAssay(object)
} else if (length(deg_assay) > 1){
deg_assay <- "SCT"
}
which_missing <- which(!(groups %in% colnames(object@meta.data)))
if (length(which_missing) > 0) stop(paste(groups[which_missing], collapse =", "), " groups were not found in object meta data.")
# start cluster
miko_message("Running Differential expression analysis...", verbose = verbose)
if (nworkers > length(groups)) nworkers <- length(groups)
cl <- parallel::makeCluster(nworkers)
doParallel::registerDoParallel(cl)
# iterate through each input file
marker.list <- foreach(i = 1:length(groups), .packages = c("scMiko", "Seurat", "presto", "dplyr")) %dopar% {
Idents(object) <- object@meta.data[[groups[i]]]
if (length(unique(Idents(object))) < 2){
qm.res <- NULL
} else {
qm.res <- presto::wilcoxauc(X = object, group_by = groups[i], assay = "data", seurat_assay = deg_assay)
qm.res$pct_dif <- qm.res$pct_in - qm.res$pct_out
}
return(qm.res)
}
# stop workers
parallel::stopCluster(cl)
# unpackage results
query_markers.list <- marker.list
names(query_markers.list) <- groups
# remove baggage
rm(marker.list)
rm(object)
invisible({gc()})
# all.deg.list <- query_markers.list
miko_message("Applying filters...", verbose = verbose)
for (i in 1:length(query_markers.list)){
set.name <- names(query_markers.list)[i]
qm.res <- query_markers.list[[set.name]]
if (only_pos){
qm.res <- qm.res %>% dplyr::mutate(significant = (logFC > 0 & padj <= fdr_threshold & abs(logFC) > logfc_threshold))
} else {
qm.res <- qm.res %>% dplyr::mutate(significant = (padj <= fdr_threshold & abs(logFC) > logfc_threshold))
}
query_markers.list[[set.name]] <- qm.res
}
return(query_markers.list)
}
# run DEG analysis
all.deg.list <- multiDEG(object = so.query, groups = cluster.name,
only_pos = parameter.list$only.pos,
nworkers = parameter.list$n.workers$deg,
fdr_threshold = parameter.list$fdr.threshold,
logfc_threshold = parameter.list$lfc.threshold, verbose = T )
query_markers.list <- lapply(all.deg.list, function(x) x %>% dplyr::filter(significant))
```
```{r CDI-based specificity score}
ms.list <- multiSpecificity(object = so.query,
cluster_names = cluster.name,
features = NULL,
deg_prefilter = T,
cdi_bins = seq(0, 1, by = 0.01),
min.pct = 0.1,
n.workers = parameter.list$n.workers$deg,
return_dotplot = T,
verbose = T)
# ms.list$
df.auc.spec <- ms.list$specificity_summary
qm.res.sum.sum.all <- ms.list$specificity_raw
plt.clust.spec <- ms.list$auc_plot
plt.auc.spec <- ms.list$resolution_plot
plt.auc.dot <- ms.list$dot_plot
cdi_result <- ms.list$cdi_results
if (parameter.list$print.inline){
print(plt.clust.spec)
print(plt.auc.spec)
}
```
```{r make wide helper function}
make.wide <- function(u_clusters, top_markers, n_top_markers, rename_clusters = TRUE){
top_markers_wide <- data.frame()
top_marker_clusters <- as.vector(top_markers$group)
u_clusters <- as.numeric(as.character(u_clusters))
u_clusters <- u_clusters[order(u_clusters)]
for (i in 1:length(u_clusters)){
top_markers_current <- top_markers$feature[top_marker_clusters == u_clusters[i]]
n_markers_cur <- length(top_markers_current)
if (!(n_markers_cur >= n_top_markers)){
top_markers_current <- c(top_markers_current, rep(NaN, n_top_markers-n_markers_cur))
}
top_markers_wide[seq(1, n_top_markers), c(paste("c", u_clusters[i], sep = ""))] <- as.data.frame(top_markers_current)
if (!rename_clusters){
colnames(top_markers_wide)[i] <- u_clusters[i]
}
}
return(top_markers_wide)
}
```
```{r get top markers function}
# function to get top cluster-specific markers
get.topmarkers <- function(query.markers, n_top_markers = 10, cluster_res_label = "", rename_clusters = TRUE){
top_markers <- query.markers %>%
dplyr::group_by(group) %>%
dplyr::arrange(desc(auc))
top_markers_filter <- top_markers[ , c("group", "feature")]
top_marker.tally <- top_markers_filter %>%
dplyr::group_by(group) %>%
dplyr::tally()
max.n <- max(top_marker.tally$n)
top_markers_filter$group <- as.numeric(as.character(top_markers_filter$group))
top_markers$group <- paste("c", as.character(top_markers$group), sep = "")
top_markers$pval <- signif( top_markers$pval, 3)
top_markers$padj <- signif( top_markers$padj, 3)
top_markers$logFC <- signif( top_markers$logFC, 3)
top_markers_50 <- query.markers %>%
dplyr::group_by(group) %>%
dplyr::arrange(desc(auc)) %>%
dplyr::top_n(n = 50, wt = auc)
top_markers_50 <- top_markers_50[ , c("group", "feature")]
u_clusters <- as.vector(unique(top_markers_filter$group))
u_clusters <- u_clusters[order(u_clusters)]
top_markers_wide <- make.wide(u_clusters,
top_markers = top_markers_filter,
n_top_markers = max.n,
rename_clusters = rename_clusters)
top_markers_50_wide <- make.wide(u_clusters, top_markers = top_markers_50, n_top_markers = 50, rename_clusters = rename_clusters)
top5_markers <- query.markers %>%
dplyr::group_by(group) %>%
dplyr::arrange(desc(auc)) %>%
dplyr::top_n(n = 50, wt = auc)
# assign outputs to list and return
output <- list(top_markers, top_markers_50, top_markers_wide, top_markers_50_wide, top5_markers)
names(output) <- c("top_markers_res",
"top_markers_50_res",
"top_markers_wide_res",
"top_markers_50_wide_res",
"top5_markers_res")
return(output)
}
```
```{r get top cluster markers, warning = FALSE }
# get top markers
which.res2rmv <- c()
top_markers_res.list <- list()
top_markers_50_res.list <- list()
top_markers_wide_res.list <- list()
top_markers_50_wide_res.list <- list()
top5_markers_res.list <- list()
for (i in 1:length(query_markers.list)){
if (length( query_markers.list[[i]]) == 0) {
which.res2rmv <- c(which.res2rmv, parameter.list$cluster.resolution[i])
next
}
output.markers <- get.topmarkers(query.markers = query_markers.list[[i]], cluster_res_label = parameter.list$cluster.resolution[i])
top_markers_res.list[[as.character(parameter.list$cluster.resolution[i])]] <- as.data.frame(output.markers[["top_markers_res"]])
top_markers_50_res.list[[as.character(parameter.list$cluster.resolution[i])]] <- as.data.frame(output.markers[["top_markers_50_res"]])
top_markers_wide_res.list[[as.character(parameter.list$cluster.resolution[i])]] <- as.data.frame(output.markers[["top_markers_wide_res"]])
top_markers_50_wide_res.list[[as.character(parameter.list$cluster.resolution[i])]] <- as.data.frame(output.markers[["top_markers_50_wide_res"]])
top5_markers_res.list[[as.character(parameter.list$cluster.resolution[i])]] <- as.data.frame(output.markers[["top5_markers_res"]])
# quiet(list2env(output.markers, env = environment()), all = TRUE)
}
# omit resolution where no DEGs were observed
cr_original <- parameter.list$cluster.resolution
parameter.list$cluster.resolution <-parameter.list$cluster.resolution[!(parameter.list$cluster.resolution %in% which.res2rmv)]
```
```{r down sample cells function}
# downsample data if necessary
downsample_cells <- function(so.query, clust.name, n.downsample = 500, seed.value = 1023){
set.seed(seed.value)
cluster_id <- as.vector(so.query@meta.data[[clust.name]])
u_clusters <- unique(cluster_id)
n_per_cluster <- round(n.downsample / length(u_clusters))
cells_to_plot <- c()
for (i in 1:length(u_clusters)){
cell_to_sample <- Cells(so.query)[cluster_id == u_clusters[i]]
if (length(cell_to_sample) < n_per_cluster){
cells_to_plot <- c(cells_to_plot, cell_to_sample)
} else {
cells_to_plot <- c(cells_to_plot, sample(Cells(so.query)[cluster_id == u_clusters[i]], n_per_cluster, replace = FALSE))
}
}
so.query_small <- subset(so.query, cells = cells_to_plot)
return(so.query_small)
}
```
```{r heatmap new}
plt.heat.scaled.list <- list()
for (i in 1:length(all.deg.list)){
u.clust <- getOrderedGroups(so.query, which.group = names(all.deg.list)[i], is.number = T)
u.clust <- u.clust[!is.na(u.clust)]
df.prest <- all.deg.list[[names(all.deg.list)[i]]]
e.mat <- pivot_wider(df.prest %>% dplyr::select(feature, group, avgExpr), names_from = group, values_from = avgExpr)
e.mat <- col2rowname(e.mat, "feature"); e.mat <- as.matrix(e.mat[ ,order(as.numeric(colnames(e.mat)))])
df.e.wide <- data.frame(gene = rownames(e.mat), as.data.frame(e.mat)); colnames(df.e.wide) <- c("gene", u.clust)
# deg.gene <- query_markers.list[[names(all.deg.list)[i]]]$feature
var.features <- so.query@assays[[DefaultAssay(so.query)]]@var.features
# scaled expression plots
df.s4plt <- as.matrix(df.e.wide %>% dplyr::select(-c("gene")))
df.s4plt <- df.s4plt[rownames(df.s4plt) %in% unique(var.features), ]
# df.s4plt <- (apply(df.s4plt, 2, function(x) (x-mean(x))/sd(x)))
df.s4plt <- t(apply(df.s4plt, 1, function(x) (x-mean(x, na.rm = T))/sd(x, na.rm = T)))
scale.lim <- 2
df.s4plt[df.s4plt > scale.lim] <- scale.lim
df.s4plt[df.s4plt < (-1*scale.lim)] <- (-1*scale.lim)
my.breaks <- seq((-1*scale.lim), scale.lim, by = 0.01)
# my.col <- colorRampPalette(c(scales::muted("blue"), "white",scales::muted("red")))(length(my.breaks))
df.s4plt <- df.s4plt[!(apply(df.s4plt, 1, function(x) all(is.na(x)))), ]
plt.heat.scaled <- pheatmap::pheatmap(df.s4plt,
breaks = my.breaks,
color = colorRampPalette(rev(brewer.pal(n = 7, name = "RdBu")))(length(my.breaks)),
show_rownames = F,
main = "Scaled Expression\nx=cluster, y=gene, z=scaled expression", silent = T)
plt.heat.scaled <- ggplotify::as.ggplot(plt.heat.scaled)
rm(df.s4plt);
plt.heat.scaled.list[[ names(all.deg.list)[i]]] <- plt.heat.scaled
}
```
```{r silhouette analysis}
if (exists("so.sub")){
msil_list <- multiSilhouette(object = so.sub, groups = cluster.name, assay_pattern = assay.pattern, verbose = T)
} else {
msil_list <- multiSilhouette(object = so.query, groups = cluster.name, assay_pattern = assay.pattern, verbose = T)
}
sil.plot <- msil_list$silhouette_plots
plt.silw.dep <- msil_list$resolution_plot
df.silw <- msil_list$silhouette_raw
df.silw.sum <- msil_list$silhouette_summary
rm(msil_list); invisible({gc()})
if (parameter.list$print.inline){
print(sil.plot)
print(plt.silw.dep)
}
```
```{r res vs n clust curve}
# get resolutions
clust.names <- names(query_markers.list)
current.assay <- DefaultAssay(so.query)
clust.res <- as.numeric(gsub(assay.pattern, "",clust.names))
# get cluster number
clust.n <- purrr::map_dbl(query_markers.list, function(x) length(unique(x[["group"]])))
# get number of degs per cluster
n.deg <- purrr::map(query_markers.list, function(x){
if (nrow(x) > 0){
df.sum <- x %>%
dplyr::group_by(get("group")) %>%
dplyr::summarize(
sig.deg = sum(padj < 0.05), .groups = 'drop'
)
} else {
NULL
}
})
# compute deg statistics
n.deg.summary <- purrr::map(n.deg, function(x){
if (!is.null(x)){
df.sum <- data.frame(
min.n = min(x$sig.deg, na.rm = T),
max.n = max(x$sig.deg, na.rm = T),
mean.n =mean(x$sig.deg, na.rm = T),
sd.n = sd(x$sig.deg, na.rm = T),
median.n = median(x$sig.deg, na.rm = T),
n.clust = length(x$sig.deg)
)
} else {
df.sum <- data.frame(
min.n = NA,
max.n = NA,
mean.n = NA,
sd.n = NA,
median.n = NA,
n.clust = NA
)
}
})
df.clust.n <- data.frame(clust.res = clust.res,
n.clust = clust.n,
deg.meanN = purrr::map_dbl(n.deg.summary, function(x) x$mean.n),
deg.medianN = purrr::map_dbl(n.deg.summary, function(x) x$median.n),
deg.minN = purrr::map_dbl(n.deg.summary, function(x) x$min.n),
deg.maxN = purrr::map_dbl(n.deg.summary, function(x) x$max.n))
df.clust.mean <- df.clust.n[ ,c("clust.res", "deg.meanN")]; df.clust.mean$type = "mean"; colnames(df.clust.mean) <- c("res", "val", "type")
df.clust.min <- df.clust.n[ ,c("clust.res", "deg.minN")]; df.clust.min$type = "min"; colnames(df.clust.min) <- c("res", "val", "type")
df.clust.max <- df.clust.n[ ,c("clust.res", "deg.maxN")]; df.clust.max$type = "max"; colnames(df.clust.max) <- c("res", "val", "type")
df.clust.long <- bind_rows(df.clust.mean, df.clust.min, df.clust.max)
plt.deg.n <- df.clust.long %>%
dplyr::filter(type == "min") %>%
ggplot(aes(res, val)) +
# , color = type
geom_point(size = 3) +
geom_line() +
geom_hline(yintercept = 0, linetype = "dashed") +
theme_miko(legend = T) +
ggthemes::scale_color_ptol() +
labs(title = "Number of Differentially-Expressed Genes (DEG)", x = "Resolution", y = "min(N DEG/Cluster)", color = "Statistic") +
theme(panel.grid.minor = element_line(colour="grey95", size=0.1),
panel.grid.major = element_line(colour="grey85", size=0.1),
panel.grid.minor.y = element_blank(),
panel.grid.major.y = element_blank()) +
scale_x_continuous(minor_breaks = seq(0 , max(df.purity.sum$resolution, na.rm = T), 0.1) ,
breaks = seq(0, max(df.purity.sum$resolution, na.rm = T), 0.2))
plt.clust.n <- df.clust.n %>%
ggplot(aes(clust.res, n.clust)) +
geom_line() +
geom_point(size = 3) +
theme_miko() +
xlab("Resolution") +
ylab("Cluster Number") +
labs(title = "Cluster Resolution vs. Number of Cell Populations") +
theme(panel.grid.minor = element_line(colour="grey95", size=0.1),
panel.grid.major = element_line(colour="grey85", size=0.1),
panel.grid.minor.y = element_blank(),
panel.grid.major.y = element_blank()) +
scale_x_continuous(minor_breaks = seq(0 , max(df.purity.sum$resolution, na.rm = T), 0.1) ,
breaks = seq(0, max(df.purity.sum$resolution, na.rm = T), 0.2))
if (parameter.list$print.inline){
print(plt.clust.n)
print(plt.deg.n)
}
```
```{r resolution-specific performance metrics, fig.width=15, fig.height=5}
marker.spec.list <- list()
u.res <- unique(qm.res.sum.sum.all$resolution)
for (i in 1:length(u.res)){
qm.res.sum.sum.cur <- qm.res.sum.sum.all
qm.res.sum.sum.cur$is.res <- qm.res.sum.sum.cur$resolution %in% u.res[i]
qm.res.sum.sum.cur <- qm.res.sum.sum.cur %>% dplyr::arrange(is.res)
u.res2 <- unique(as.character(qm.res.sum.sum.cur$resolution))
qm.res.sum.sum.cur$res <- factor(qm.res.sum.sum.cur$resolution, levels = u.res2)
marker.spec.list[[u.res[i]]] <- qm.res.sum.sum.cur %>%
dplyr::arrange(is.res) %>%
ggplot(aes(x = bin, y = pdeg, color = is.res)) +
geom_path(aes(group = res)) +
theme_miko(legend = F) +
scale_color_manual(values = c("TRUE" = "tomato", "FALSE" = "grey")) +
xlab("target DEG-AUC") +
ylab("Proportion of Clusters\n(>=1 DEG exceeding target DEG-AUC)") +
geom_vline(xintercept = 0.5, linetype = "dashed") +
labs(title = "Marker Specificity",
color = "Resolution", caption = paste0("red curve: resolution = ", gsub(assay.pattern, "", u.res[i]), "; grey curve: other resolutions" ))
}
res.specific.list <- list()
for (i in 1:length(query_markers.list)){
set.name <- names(query_markers.list)[i]
set.name2 <- gsub(assay.pattern, "", set.name)
if (!is.null(df.silw)){
avg.sil <- signif(mean((df.silw %>% dplyr::filter(cluster.resolution == set.name2) )$sil.width, na.rm = T), 3)
if (!is.nan(avg.sil)){
# A: silhoutte
p1 <- sil.plot[[set.name]] +
theme_set(theme_get() + theme(text = element_text(family = 'Open Sans'))) +
theme_miko(legend = T) +
theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank()) +
labs(x = "Cell Index", y = "Silhouette Width") +
labs(title = "Silhoutte Plot", subtitle = NULL) +
labs(subtitle = paste0("Resolution = ", set.name2), caption = paste0("Average silhouette width = ", avg.sil))
} else {
p1 <- NULL
}
} else {
p1 <- NULL
}
# B: purity score UMAP
avg.pur <- signif(mean(purity.umap.list[[set.name]]$data$purity, na.rm = T), 3)
p2 <- purity.umap.list[[set.name]] + labs(subtitle = paste0("Resolution = ", set.name2),
caption = paste0("Average purity score = ", avg.pur))
# C: marker specificity score
p3 <- marker.spec.list[[set.name2]] + labs(subtitle = paste0("Resolution = ", set.name2))
res.specific.list[[set.name]] <- cowplot::plot_grid(p1, p2, p3, nrow = 1, align = "h", axis = "tb", labels = c("H", "I", "J"))
if (parameter.list$print.inline){
print(res.specific.list[["SCT_snn_res.0.05"]])
}
}
```
```{r central log}
# update central log
if (parameter.list$developer){
run.id <- NULL
if (!exists("user")) user <- "guest"
clog.update.success <- F
if (parameter.list$update.log){
try({
run.id <- updateCentralLog(Module = "M18", input.data = parameter.list$input.file, input.subset = NA, pdf.flag = parameter.list$save.pdf)
clog.update.success <- T
}, silent = F)
}
if (is.null(run.id)) {
warning("Central log update was unsuccessful :(\n")
# run.id <- paste("M18_", user, "_r", paste0(format(Sys.time(), '%s')), sep = "", collapse = "")
run.id <- paste("M18_", user, "_", gsub(":| ", "", paste0(format(Sys.time(), '%d_%m_%X'))), sep = "", collapse = "")
}
} else {
run.id <- paste("output_", gsub(":| ", "", paste0(format(Sys.time(), '%X'))), sep = "", collapse = "")
}
```
```{r output directories}
# output path
if (!exists("data.path")) data.path = ""
output.path <- paste0(data.path, "Module_Outputs/", paste0(run.id,"_",format(Sys.time(), '%d%m%y'), "/"))
# create output directories
if (parameter.list$save.pdf) {
dir.create(output.path)
dir.create(paste0(output.path, "Tables/"))
dir.create(paste0(output.path, "PDF/"))
}
```
Results
=====================================
Sidebar {.sidebar data-width=400}
-------------------------------------
**Description**: Cells are clustered using unsupervised Louvain community detection over a range of candidate resolutions. At each resolution, i) cluster-level UMAP and heatmap representations are generated, ii) resolution-specific performance metrics are derived, and iii) cluster-specific differential expression analysis is performed.
**Figure Legends**:\
**A|** Resolution-specific UMAP representation of cell clusters.\
**B|** Resolution-specific heatmap of top variable genes. Mean expression values are scaled row-wise.\
**C|** Resolution-specific silhouette widths. *Range* = [-1,1]\
**D|** Resolution-specific purity scores. *Range* = [0,1]\
**E|** Resolution-specific marker specificity scores. *Range* = [0,1]\
**F|** Resolution-specific number of cell clusters.\
**G|** Resolution-specific minimum number of differentially-expressed genes per cluster.\
**H|** Silhouette plot.\
**I|** Purity scores overlaid on a UMAP. See notes for details. \
**J|** Marker specificity curve. See notes for details. *Red curve*: specified resolution; *grey curves*: all other resolutions.\
**Notes**:\
**1. Purity Score**: For a given cell *i*, the purity score is defined as the proportion of cells within cell *i*'s K-nearest neighborhood (KNN) that belong to the most represented cluster within that KNN. The purity score ranges between 0 to 1. Neighborhoods in which cells belong to many different clusters are considered "*impure*" (low purity score) whereas neighborhoods in which cells belong to a single cluster are "*pure*" (high purity score). The purity score, together with the silhouette width, provides a measure of cluster consistency and may be used in guiding selection of the optimal clustering resolution.\
**2. Marker Specificity**: The marker specificity score is a measure of how well differentially-expressed genes (DEG) can discriminate (i.e., classify) clusters at a given resolution. To derive the specificity score, nCDI estimates are used to construct a *marker specificity curve* (Panel J). The x axis represents nCDI values ranging from 0 to 1. The y axis represents the proportion of clusters that have at least 1 DEG exceeding the target nCDI (on the x axis). The area under the marker specificity curve represents the *marker specificity score* and can range from 0 (no clusters have a specific DEG) to 1 (each cluster has a specific DEG). The marker specificity score can be compared across different clustering resolutions to determine at which resolution clusters have specific DEGs.\
**Definitions**:\
**res**: Clustering resolution\
**UMAP**: Uniform manifold approximation and projection\
**nCDI**: Normalized co-dependency index\
**Sample Overview**:\
Cells, n: `r ncol(so.query)`\
Genes, n: `r nrow(so.query)`\
Variable genes, n: `r length(VariableFeatures(so.query))`\
UMI/cell, median: `r round(median(so.query@meta.data[["nCount_RNA"]]))`\
genes/cell, median: `r round(median(so.query@meta.data[["nFeature_RNA"]]))`\
Species: `r parameter.list$species`
Row {.tabset}
-------------------------------------
```{r plt.umap_by_all_clusters, message=FALSE, warning=FALSE, fig.width=12, fig.height=5}
plt.clust.ov <- list()
for (i in 1:length(plt.umap_by_cluster)){
set.name <- names(plt.umap_by_cluster)[i]
plt.clust.ov[[set.name]] <- cowplot::plot_grid(plt.umap_by_cluster[[set.name]], plt.heat.scaled.list[[set.name]], labels = "AUTO")
}
out <- lapply(seq_along(plt.clust.ov), function(i) {
set.name <- gsub(assay.pattern, "", names(plt.clust.ov)[i])
set.name2 <- names(plt.clust.ov)[i]
a1 <- knitr::knit_expand(text = sprintf("\n### %s\n", paste("res = ", set.name, sep = ""))) # tab header
a2 <- knitr::knit_expand(text = sprintf("\n```{r %s, message=FALSE, warning=FALSE, fig.width=12, fig.height=5}",
paste("clust_umap", i, sep = ""))) # start r chunk
a3 <- knitr::knit_expand(text = sprintf("\nprint(plt.clust.ov[['%s']])",set.name2))
a4 <- knitr::knit_expand(text = "\n```\n") # end r chunk
paste(a1, a2, a3, a4, collapse = '\n') # collapse together all lines with newline separator
})
```
`r paste(knitr::knit(text = paste(out, collapse = '\n')))`
```{r pdf plt.umap_by_all_clusters }
try({
for (i in 1:length(plt.clust.ov)){
set.name <- gsub(assay.pattern, "", names(plt.clust.ov)[i])
suffix = "resolution.pdf"
plot.name <- paste0("M18_umap_", gsub(".", "_", set.name, fixed = T), suffix)
savePDF(file.name = paste0(output.path, "PDF/", plot.name), plot.handle = plt.clust.ov[[i]],
fig.width = 12, fig.height = 5, save.flag = parameter.list$save.pdf)
}
}, silent = T)
```
Row {.tabset}
-------------------------------------
### Metrics
```{r opt stat, message=FALSE, warning=FALSE, fig.width=25, fig.height=5}
sil.success <- F
try({
plt.silw.dep + labs(subtitle = "1 = dense/separated clusters; 0 = overlapping clusters")
sil.success <- T
}, silent = T)
if (!sil.success) plt.silw.dep <- NULL
plt.opt.stat <- cowplot::plot_grid( plt.silw.dep + geom_hline(yintercept = 0.6, linetype = "dashed") ,
plt.cluster.purity + labs(subtitle = "1 = within-neighborhood clusters consistent (pure); 0 = impure"),
plt.auc.spec + labs(subtitle = "1 = specific DEG in all clusters; 0 = no specific DEGs"),
plt.clust.n + labs(title = "Number of Clusters", subtitle = ""),
plt.deg.n + labs(title = "Minimum Number of DEGs", y = "Minimum N DEGs/cluster", subtitle = ""), nrow = 1, labels = c("C", "D", "E", "F", "G"), rel_widths = c(1,1,1,1,1))
print(plt.opt.stat)
try({
savePDF(file.name = paste0(output.path, "PDF/", "M18_optimization_metrics.pdf"),
plot.handle = plt.opt.stat,
fig.width=25, fig.height=5, save.flag = parameter.list$save.pdf)
}, silent = T)
```
```{r plt.Silhouette, message=FALSE, warning=FALSE, fig.width=15, fig.height=5}
out <- lapply(seq_along(res.specific.list), function(i) {
res.val <- gsub(assay.pattern, "", names(res.specific.list)[i])
a1 <- knitr::knit_expand(text = sprintf("\n### %s\n", paste("res = ", res.val, sep = ""))) # tab header
a2 <- knitr::knit_expand(text = sprintf("\n```{r %s, message=FALSE, warning=FALSE, fig.width=15, fig.height=5}",
paste("sil.plt", i, sep = ""))) # start r chunk
a3 <- knitr::knit_expand(text = sprintf("\nprint(res.specific.list[[ names(res.specific.list)[%d]]])",i))
a4 <- knitr::knit_expand(text = "\n```\n") # end r chunk
paste(a1, a2, a3, a4, collapse = '\n') # collapse together all lines with newline separator
})
```
`r paste(knitr::knit(text = paste(out, collapse = '\n')))`
```{r pdf plt.Silhouette }
try({
for (i in 1:length(res.specific.list)){
res.val <- gsub(assay.pattern, "", names(res.specific.list)[i])
plot.name <- paste0("M18_performance_metrics_", gsub(".", "_",res.val, fixed = T), "resolution.pdf")
savePDF(file.name = paste0(output.path, "PDF/", plot.name), plot.handle = res.specific.list[[i]],
fig.width = 15, fig.height = 5, save.flag = parameter.list$save.pdf)
}
}, silent = T)
```
DEG Table
=====================================
Sidebar {.sidebar}
-------------------------------------
**Description**: Differentially-expressed genes (DEG) across different cluster resolutions. Differential expression performed using wilcoxon and co-dependency tests.
**Definitions**:\
**feature**: Gene symbol\
**cluster**: Cluster name\
**logFC**: Log fold-change between gene expression within vs. outside of cluster.\
**auc**: Area under receiver operator curve; measures a gene's ability to discriminate specified cluster. AUC values range 0 to 1, where 0.5 represents classification performance equal to random chance, and 1 represents perfect cluster classification.\
**p_wilcox**: Nominal p value, wilcoxon test.\
**fdr_wilcox**: Benjamini-hochberg adjusted p-value (i.e. FDR), wilcoxon test.\
**cdi**: Codependency index\
**ncdi**: Normalized CDI\
**p_cdi**: Nominal p value, codependency test.\
**fdr_cdi**: Benjamini-hochberg adjusted p-value (i.e. FDR), codependency test.\
**pct_in**: Percentage of cells within cluster with non-zero gene expression.\
**pct_out**: Percentage of cells outside cluster with non-zero gene expression.\
**pct_dif**: Difference between pct_in and pct_out.\
**sensitivity**: Proportion of within-cluster cells that express gene (true positive rate).\
**specificity**: Proportion of outside-cluster cells that do not express gene (true negative rate).\
**PPV**: Positive predictive value\
**NPV**: Negative predictive value\
**denominator**: Denominator used to normalized CDI. Dependent on cluster size.\
**avgExpr**: Average cluster-level gene expression.\
Row {.tabset}
-------------------------------------
```{r detailed table output v2, echo = FALSE, eval = TRUE, message=FALSE, warning=FALSE}
for (i in 1:length(top_markers_res.list)){
# df.deg <- top_markers_res.list[[i]]
df.deg.cur <- top_markers_res.list[[i]]
df.deg.cur$pct_in <- df.deg.cur$pct_in
df.deg.cur$sensitivity <- df.deg.cur$pct_in/((100-df.deg.cur$pct_in) + (df.deg.cur$pct_in))
df.deg.cur$specificity <- (100-df.deg.cur$pct_out)/((100-df.deg.cur$pct_out) + (df.deg.cur$pct_out))
df.deg.cur$PPV <- df.deg.cur$pct_in/( df.deg.cur$pct_in + (df.deg.cur$pct_out))
df.deg.cur$NPV <- (100-df.deg.cur$pct_out)/((100-df.deg.cur$pct_out) + (100-df.deg.cur$pct_in))
set.stem <- paste0(assay.pattern, names(top_markers_res.list)[i], "_")
cdi_result.cur <- cdi_result %>% dplyr::filter(grepl(set.stem, feature.x))
cdi_result.cur$cluster <- paste0("c", stringr::str_remove(cdi_result.cur$feature.x, set.stem))
colnames(cdi_result.cur) <- c("id", "feature", "cdi", "ncdi", "denominator", "p.cdi", "fdr.cdi", "group")
df.deg <- merge(df.deg.cur, cdi_result.cur, all.x = T)
field2signif <- c("avgExpr", "logFC", "auc", "pval", "padj", "pct_in", "pct_out", "pct_dif", "cdi", "ncdi", "denominator", "p.cdi", "fdr.cdi", "sensitivity", "specificity", "PPV", "NPV")
field2signif <- field2signif[field2signif %in% colnames(df.deg)]
df.deg[ ,field2signif] <- signif(df.deg[ ,field2signif], 3)
df.deg <- df.deg %>% dplyr::select(c("feature", "group", "logFC", "auc", "pval", "padj", "cdi", "ncdi", "p.cdi", "fdr.cdi", "pct_in", "pct_out", "pct_dif", "sensitivity", "specificity", "PPV", "NPV", "denominator","avgExpr"))
colnames(df.deg) <- c("feature", "cluster", "logFC", "auc", "p_wilcox", "fdr_wilcox", "cdi", "ncdi","p_cdi", "fdr_cdi", "pct_in", "pct_out", "pct_dif", "sensitivity", "specificity", "PPV", "NPV", "denominator", "avgExpr")
top_markers_res.list[[i]] <- df.deg
}
out_heat <- lapply(seq_along(top_markers_res.list), function(i) {
s1 <- paste("top_markers_res.list[[", i, "]]", sep = "")
table.name <- paste("res=", names(top_markers_res.list)[i], "")
s4 <- paste("datatable(", s1, ",
filter = 'top',
extensions = 'Buttons',
options = list(pageLength = 50,
dom = 'Bfrtip',
buttons = c('copy', 'csv', 'pdf')))", sep = "")
a1 <- knitr::knit_expand(text = sprintf("### %s\n", table.name)) # tab header
a2 <- knitr::knit_expand(text = sprintf("\n```{r %s}", paste("detailed_tab_v2", i, sep = ""))) # start r chunk
a3 <- knitr::knit_expand(text = sprintf("\n %s",s4))
a4 <- knitr::knit_expand(text = "\n```\n") # end r chunk
paste(a1, a2, a3, a4, collapse = '\n') # collapse together all lines with newline separator
})
```
`r paste(knitr::knit(text = paste(out_heat, collapse = '\n')))`
```{r csv cm2}
for (i in 1:length(top_markers_res.list)){
table.name <- paste0("markerStatistics_", gsub(".", "_", names(top_markers_res.list)[i], fixed = T), "resolution.csv")
if (parameter.list$save.pdf){
write.csv(top_markers_res.list[[i]], file = paste0(output.path, "Tables/", table.name),
row.names = F)
}
}
```
```{r save analysis log and Rdata results, message=FALSE, warning=FALSE}
# Run time
end.time <- proc.time()
elapsed.time <- round((end.time - start.time)[[3]], 2)
df.log[nrow(df.log)+1, 1] <- as.character("Run Time (s)")
df.log[nrow(df.log), 2] <- as.character("elapsed.time")
df.log[nrow(df.log), 3] <- as.character(elapsed.time)
df.log <- addLogEntry("Run Identifier", run.id, df.log, "run.id")
if (parameter.list$developer){
df.log <- addLogEntry("User", user, df.log, "user")
df.log <- addLogEntry("Central Log Updated", clog.update.success, df.log, "clog.update.success")
}
df.log_Module_18 <- df.log
```
```{r ph10, echo = FALSE, eval = TRUE, message=FALSE, warning=FALSE}
out1 <- lapply(seq_along(module.logs), function(i) {
module.n <- as.numeric(gsub("[^\\d]+", "", module.logs[i], perl=TRUE))
a1 <- knitr::knit_expand(text = sprintf("\nLog (Module %s)", paste(module.n)))
a2 <- knitr::knit_expand(text = "\n=====================================")
a3 <- knitr::knit_expand(text = sprintf("\n```{r %s}", paste("mod_", i, sep = ""))) # start r chunk
a4<- knitr::knit_expand(text = sprintf("\nknitr::kable(%s)",module.logs[i]))
a5 <- knitr::knit_expand(text = "\n```\n") # end r chunk
paste(a1, a2, a3, a4, a5, collapse = '\n') # collapse together all lines with newline separator
})
```
`r paste(knitr::knit(text = paste(out1, collapse = '\n')))`
Log (Module 18)
=====================================
```{r table.log_current, message=FALSE, warning=FALSE}
knitr::kable(df.log_Module_18)
```
```{r save analysis log as csv}
try({
if (parameter.list$developer){
write.csv(df.log_Module_18, file = paste0(output.path, "Tables/", "analysisLog.csv"),
row.names = F)
}
}, silent = T)
```
System Info
=====================================
```{r}
pander::pander(sessionInfo())
```