Last updated: 2018-07-16
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Environment: empty
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
✔ Seed:
set.seed(20180618)
The command set.seed(20180618)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
✔ Session information: recorded
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
✔ Repository version: 884f74e
wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: R/.Rhistory
Ignored: analysis/.Rhistory
Ignored: analysis/correcting_detection_rate/.Rhistory
Ignored: analysis/pipeline/.Rhistory
Untracked files:
Untracked: ..gif
Untracked: .DS_Store
Untracked: R/.DS_Store
Untracked: analysis/.DS_Store
Untracked: analysis/bibliography.bib
Untracked: analysis/clean_data/
Untracked: analysis/downsampling/
Untracked: analysis/dropouts/
Untracked: analysis/filteringtest.R
Untracked: analysis/old/
Untracked: analysis/pipeline/large_sets.pdf
Untracked: analysis/pipeline/temp_ari.txt
Untracked: analysis/pipeline/temp_time.txt
Untracked: analysis/sysdata_reference_test/
Untracked: analysis/tutorial_cache/
Untracked: analysis/writeup/cite.bib
Untracked: analysis/writeup/cite.log
Untracked: analysis/writeup/paper.aux
Untracked: analysis/writeup/paper.bbl
Untracked: analysis/writeup/paper.blg
Untracked: analysis/writeup/paper.log
Untracked: analysis/writeup/paper.out
Untracked: analysis/writeup/paper.synctex.gz
Untracked: analysis/writeup/paper.tex
Untracked: analysis/writeup/writeup.aux
Untracked: analysis/writeup/writeup.bbl
Untracked: analysis/writeup/writeup.blg
Untracked: analysis/writeup/writeup.dvi
Untracked: analysis/writeup/writeup.log
Untracked: analysis/writeup/writeup.out
Untracked: analysis/writeup/writeup.synctex.gz
Untracked: analysis/writeup/writeup.tex
Untracked: analysis/writeup/writeup2.aux
Untracked: analysis/writeup/writeup2.bbl
Untracked: analysis/writeup/writeup2.blg
Untracked: analysis/writeup/writeup2.log
Untracked: analysis/writeup/writeup2.out
Untracked: analysis/writeup/writeup2.pdf
Untracked: analysis/writeup/writeup2.synctex.gz
Untracked: analysis/writeup/writeup2.tex
Untracked: analysis/writeup/writeup3.aux
Untracked: analysis/writeup/writeup3.log
Untracked: analysis/writeup/writeup3.out
Untracked: analysis/writeup/writeup3.synctex.gz
Untracked: analysis/writeup/writeup3.tex
Untracked: data/unnecessary_in_building/
Unstaged changes:
Modified: R/SLSL.R
Modified: analysis/_site.yml
Modified: analysis/pipeline/large_sets.Rmd
Modified: analysis/pipeline/small_good_sets.Rmd
Modified: analysis/pipeline/small_good_sets_result.txt
Modified: analysis/pipeline/small_good_sets_time.txt
Modified: analysis/writeup/.DS_Store
Modified: src/SCNoisyClustering.cpp
Modified: src/SCNoisyClustering.o
Modified: src/SCNoisyClustering.so
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
i = 2
load('../data/unnecessary_in_building/2_Pollen.RData')
X = as.matrix(Pollen$x)
truth = as.numeric(as.factor(Pollen$label))
numClust = length(unique(truth))
logX = log(X+1)
det = colSums(X!=0) / nrow(X)
det2 = qr(det)
R = t(qr.resid(det2, t(logX)))
pca1 = irlba(R,2); pca2 = irlba(logX,2)
dat = data.frame(pc1=c(pca1$v[,1], pca2$v[,1]), detection.rate=rep(det, 2), label=rep(c("After correction", "Before correction"), each=nrow(pca1$v)), true.label=as.factor(rep(truth,2)))
ggplot(dat, aes(x=pc1, y=detection.rate, col=true.label)) + facet_grid(~label) + geom_point() + ggtitle("PCA")
tsne1 = Rtsne(t(R))
tsne2 = Rtsne(t(logX))
dat = data.frame(v1 = c(tsne1$Y[,1], tsne2$Y[,1]), v2 = c(tsne1$Y[,2], tsne2$Y[,2]), label=rep(c("After correction", "Before correction"), each=nrow(tsne1$Y)), true.label = as.factor(rep(truth, 2)))
ggplot(dat, aes(x=v1, y=v2, col=true.label)) + facet_grid(~label) + geom_point() + ggtitle("tSNE")
set.seed(1); res1 = SLSL(R, log=F, filter=F)
adj.rand.index(res1$result, truth)
[1] 0.7755788
set.seed(1); res2 = SLSL(X, log=T, filter=F, correct_detection_rate = F)
adj.rand.index(res2$result, truth)
[1] 0.8659285
set.seed(1); res3 = SLSL(X, log=T, filter=F, correct_detection_rate = T)
adj.rand.index(res3$result, truth)
[1] 0.7665684
i = 3
load('../data/unnecessary_in_building/3_Usoskin.RData')
X = as.matrix(Usoskin$X)
truth = as.numeric(as.factor(as.character(Usoskin$lab1)))
numClust = 4
rm(Usoskin)
logX = log(X+1)
det = colSums(X!=0) / nrow(X)
plot(irlba(logX,1)$v[,1]~log(det))
det2 = qr(cbind(rep(1, length(det)), log(det)))
R = t(qr.resid(det2, t(logX)))
pca1 = irlba(R,2); pca2 = irlba(logX,2)
dat = data.frame(pc1=c(pca1$v[,1], pca2$v[,1]), detection.rate=rep(det, 2), label=rep(c("After correction", "Before correction"), each=nrow(pca1$v)), true.label=as.factor(rep(truth,2)))
ggplot(dat, aes(x=pc1, y=detection.rate, col=true.label)) + facet_grid(~label) + geom_point() + ggtitle("PCA")
tsne1 = Rtsne(t(R), perplexity=20)
tsne2 = Rtsne(t(logX))
dat = data.frame(v1 = c(tsne1$Y[,1], tsne2$Y[,1]), v2 = c(tsne1$Y[,2], tsne2$Y[,2]), label=rep(c("After correction", "Before correction"), each=nrow(tsne1$Y)), true.label = as.factor(rep(truth, 2)))
ggplot(dat, aes(x=v1, y=v2, col=true.label)) + facet_grid(~label) + geom_point() + ggtitle("tSNE")
set.seed(1); res1 = SLSL(R, log=F, filter=F)
adj.rand.index(res1$result, truth)
[1] 0.6188269
set.seed(1); res2 = SLSL(X, log=T, filter=F, correct_detection_rate = F)
adj.rand.index(res2$result, truth)
[1] 0.7345628
set.seed(1); res3 = SLSL(X, log=T, filter=F, correct_detection_rate = T)
adj.rand.index(res3$result, truth)
[1] 0.6348444
i = 4
#read data
load('../data/unnecessary_in_building/4_Buettner.RData')
X = as.matrix(Buettner$X)
truth = as.numeric(as.factor(Buettner$label))
numClust = 3
rm(Buettner)
logX = log(X+1)
det = colSums(X!=0) / nrow(X)
det2 = qr(det)
R = t(qr.resid(det2, t(logX)))
pca1 = irlba(R,2); pca2 = irlba(logX,2)
dat = data.frame(pc1=c(pca1$v[,1], pca2$v[,1]), detection.rate=rep(det, 2), label=rep(c("After correction", "Before correction"), each=nrow(pca1$v)), true.label=as.factor(rep(truth,2)))
ggplot(dat, aes(x=pc1, y=detection.rate, col=true.label)) + facet_grid(~label) + geom_point() + ggtitle("PCA")
tsne1 = Rtsne(t(R), perplexity=20)
tsne2 = Rtsne(t(logX), perplexity=20)
dat = data.frame(v1 = c(tsne1$Y[,1], tsne2$Y[,1]), v2 = c(tsne1$Y[,2], tsne2$Y[,2]), label=rep(c("After correction", "Before correction"), each=nrow(tsne1$Y)), true.label = as.factor(rep(truth, 2)))
ggplot(dat, aes(x=v1, y=v2, col=true.label)) + facet_grid(~label) + geom_point() + ggtitle("tSNE")
set.seed(1); res1 = SLSL(R, log=F, filter=F)
adj.rand.index(res1$result, truth)
[1] 0.4329975
set.seed(1); res2 = SLSL(X, log=T, filter=F, correct_detection_rate = F)
adj.rand.index(res2$result, truth)
[1] 0.428236
set.seed(1); res3 = SLSL(X, log=T, filter=F, correct_detection_rate = T)
adj.rand.index(res3$result, truth)
[1] 0.465002
i = 5
load('../data/unnecessary_in_building/5_Yan.rda')
X = as.matrix(yan)
truth = as.character(ann$cell_type1)
truth = as.numeric(as.factor(truth))
numClust = 6
rm(ann, yan)
logX = log(X+1)
det = colSums(X!=0) / nrow(X)
det2 = qr(det)
R = t(qr.resid(det2, t(logX)))
pca1 = irlba(R,2); pca2 = irlba(logX,2)
dat = data.frame(pc1=c(pca1$v[,1], pca2$v[,1]), detection.rate=rep(det, 2), label=rep(c("After correction", "Before correction"), each=nrow(pca1$v)), true.label=as.factor(rep(truth,2)))
ggplot(dat, aes(x=pc1, y=detection.rate, col=true.label)) + facet_grid(~label) + geom_point() + ggtitle("PCA")
tsne1 = Rtsne(t(R), perplexity=20)
tsne2 = Rtsne(t(logX), perplexity=20)
dat = data.frame(v1 = c(tsne1$Y[,1], tsne2$Y[,1]), v2 = c(tsne1$Y[,2], tsne2$Y[,2]), label=rep(c("After correction", "Before correction"), each=nrow(tsne1$Y)), true.label = as.factor(rep(truth, 2)))
ggplot(dat, aes(x=v1, y=v2, col=true.label)) + facet_grid(~label) + geom_point() + ggtitle("tSNE")
set.seed(1); res1 = SLSL(R, log=F, filter=F)
adj.rand.index(res1$result, truth)
[1] 0.9484199
set.seed(1); res2 = SLSL(X, log=T, filter=F, correct_detection_rate = F)
adj.rand.index(res2$result, truth)
[1] 0.9484199
set.seed(1); res3 = SLSL(X, log=T, filter=F, correct_detection_rate = T)
adj.rand.index(res3$result, truth)
[1] 0.7952752
i = 6
load('../data/unnecessary_in_building/6_Treutlein.rda')
X = as.matrix(treutlein)
truth = as.numeric(colnames(treutlein))
ind = sort(truth, index.return=TRUE)$ix
X = X[,ind]
truth = truth[ind]
numClust = length(unique(truth))
rm(treutlein)
logX = log(X+1)
det = colSums(X!=0) / nrow(X)
det2 = qr(cbind(log(det), rep(1, length(det))))
R = t(qr.resid(det2, t(logX)))
pca1 = irlba(R,2); pca2 = irlba(logX,2)
dat = data.frame(pc1=c(pca1$v[,1], pca2$v[,1]), detection.rate=rep(det, 2), label=rep(c("After correction", "Before correction"), each=nrow(pca1$v)), true.label=as.factor(rep(truth,2)))
ggplot(dat, aes(x=pc1, y=detection.rate, col=true.label)) + facet_grid(~label) + geom_point() + ggtitle("PCA")
tsne1 = Rtsne(t(R), perplexity=10)
tsne2 = Rtsne(t(logX), perplexity=10)
dat = data.frame(v1 = c(tsne1$Y[,1], tsne2$Y[,1]), v2 = c(tsne1$Y[,2], tsne2$Y[,2]), label=rep(c("After correction", "Before correction"), each=nrow(tsne1$Y)), true.label = as.factor(rep(truth, 2)))
ggplot(dat, aes(x=v1, y=v2, col=true.label)) + facet_grid(~label) + geom_point() + ggtitle("tSNE")
set.seed(1); res1 = SLSL(R, log=F, filter=F)
adj.rand.index(res1$result, truth)
[1] 0.794193
set.seed(1); res2 = SLSL(X, log=T, filter=F, correct_detection_rate = F)
adj.rand.index(res2$result, truth)
[1] 0.9603128
set.seed(1); res3 = SLSL(X, log=T, filter=F, correct_detection_rate = T)
adj.rand.index(res3$result, truth)
[1] 0.6517367
i = 7
load('../data/unnecessary_in_building/7_Chu_celltype.Rdata')
X = as.matrix(Chu_celltype$X)
truth = as.numeric(as.factor(Chu_celltype$label))
numClust = 7
rm(Chu_celltype)
logX = log(X+1)
det = colSums(X!=0) / nrow(X)
det2 = qr(det)
R = t(qr.resid(det2, t(logX)))
pca1 = irlba(R,2); pca2 = irlba(logX,2)
dat = data.frame(pc1=c(pca1$v[,1], pca2$v[,1]), detection.rate=rep(det, 2), label=rep(c("After correction", "Before correction"), each=nrow(pca1$v)), true.label=as.factor(rep(truth,2)))
ggplot(dat, aes(x=pc1, y=detection.rate, col=true.label)) + facet_grid(~label) + geom_point() + ggtitle("PCA")
tsne1 = Rtsne(t(R))
tsne2 = Rtsne(t(logX))
dat = data.frame(v1 = c(tsne1$Y[,1], tsne2$Y[,1]), v2 = c(tsne1$Y[,2], tsne2$Y[,2]), label=rep(c("After correction", "Before correction"), each=nrow(tsne1$Y)), true.label = as.factor(rep(truth, 2)))
ggplot(dat, aes(x=v1, y=v2, col=true.label)) + facet_grid(~label) + geom_point() + ggtitle("tSNE")
set.seed(1); res1 = SLSL(R, log=F, filter=F)
adj.rand.index(res1$result, truth)
[1] 0.7785884
set.seed(1); res2 = SLSL(X, log=T, filter=F, correct_detection_rate = F)
adj.rand.index(res2$result, truth)
[1] 0.7770962
set.seed(1); res3 = SLSL(X, log=T, filter=F, correct_detection_rate = T)
adj.rand.index(res3$result, truth)
[1] 0.775129
i = 8
load('../data/unnecessary_in_building/8_Chu_timecourse.Rdata')
X = as.matrix(Chu_timecourse$X)
truth = as.numeric(as.factor(Chu_timecourse$label))
numClust = length(unique(truth))
logX = log(X+1)
det = colSums(X!=0) / nrow(X)
det2 = qr(det)
R = t(qr.resid(det2, t(logX)))
pca1 = irlba(R,2); pca2 = irlba(logX,2)
dat = data.frame(pc1=c(pca1$v[,1], pca2$v[,1]), detection.rate=rep(det, 2), label=rep(c("After correction", "Before correction"), each=nrow(pca1$v)), true.label=as.factor(rep(truth,2)))
ggplot(dat, aes(x=pc1, y=detection.rate, col=true.label)) + facet_grid(~label) + geom_point() + ggtitle("PCA")
tsne1 = Rtsne(t(R))
tsne2 = Rtsne(t(logX))
dat = data.frame(v1 = c(tsne1$Y[,1], tsne2$Y[,1]), v2 = c(tsne1$Y[,2], tsne2$Y[,2]), label=rep(c("After correction", "Before correction"), each=nrow(tsne1$Y)), true.label = as.factor(rep(truth, 2)))
ggplot(dat, aes(x=v1, y=v2, col=true.label)) + facet_grid(~label) + geom_point() + ggtitle("tSNE")
set.seed(1); res1 = SLSL(R, log=F, filter=F)
adj.rand.index(res1$result, truth)
[1] 0.7321747
set.seed(1); res2 = SLSL(X, log=T, filter=F, correct_detection_rate = F)
adj.rand.index(res2$result, truth)
[1] 0.7276994
set.seed(1); res3 = SLSL(X, log=T, filter=F, correct_detection_rate = T)
adj.rand.index(res3$result, truth)
[1] 0.9692303
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.5
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] stargazer_5.2.2 abind_1.4-5
[3] broom_0.4.5 diceR_0.5.1
[5] Rtsne_0.13 fossil_0.3.7
[7] shapefiles_0.7 foreign_0.8-70
[9] maps_3.3.0 sp_1.2-7
[11] reshape_0.8.7 dplyr_0.7.6
[13] ggplot2_3.0.0 irlba_2.3.2
[15] Matrix_1.2-14 quadprog_1.5-5
[17] inline_0.3.15 matrixStats_0.53.1
[19] SCNoisyClustering_0.1.0
loaded via a namespace (and not attached):
[1] tidyselect_0.2.4 reshape2_1.4.3 purrr_0.2.4
[4] lattice_0.20-35 colorspace_1.3-2 htmltools_0.3.6
[7] yaml_2.1.19 rlang_0.2.0 R.oo_1.22.0
[10] pillar_1.2.2 glue_1.2.0 withr_2.1.2
[13] R.utils_2.6.0 bindrcpp_0.2.2 plyr_1.8.4
[16] bindr_0.1.1 stringr_1.3.0 munsell_0.4.3
[19] gtable_0.2.0 workflowr_1.1.1 R.methodsS3_1.7.1
[22] codetools_0.2-15 psych_1.8.3.3 evaluate_0.10.1
[25] labeling_0.3 knitr_1.20 parallel_3.5.1
[28] Rcpp_0.12.16 scales_0.5.0 backports_1.1.2
[31] mnormt_1.5-5 digest_0.6.15 stringi_1.1.7
[34] grid_3.5.1 rprojroot_1.3-2 tools_3.5.1
[37] magrittr_1.5 lazyeval_0.2.1 tibble_1.4.2
[40] tidyr_0.8.0 whisker_0.3-2 pkgconfig_2.0.1
[43] assertthat_0.2.0 rmarkdown_1.9 R6_2.2.2
[46] mclust_5.4 nlme_3.1-137 git2r_0.21.0
[49] compiler_3.5.1
This reproducible R Markdown analysis was created with workflowr 1.1.1