Last updated: 2018-05-10
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Environment: empty
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
✔ Seed:
set.seed(20180510)
The command set.seed(20180510)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
✔ Session information: recorded
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
✔ Repository version: fc22b3f
wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Untracked files:
Untracked: .DS_Store
Untracked: analysis/chunks.R
Untracked: code/Microarray_AAD_Mayfield.R
Untracked: code/Microarray_ASD_Chow.R
Untracked: code/Microarray_ASD_Garbett.R
Untracked: code/Microarray_ASD_Voineagu.R
Untracked: code/Microarray_All_GLM.R
Untracked: code/Microarray_IBD_Granlund.R
Untracked: code/Microarray_IBD_Noble.R
Untracked: code/Microarray_MDD_Sibille.R
Untracked: code/Microarray_SCZ_BD_Chen.R
Untracked: code/Microarray_SCZ_BD_Iwa.R
Untracked: code/Microarray_SCZ_BD_MDD_Lanz.R
Untracked: code/Microarray_SCZ_Maycox.R
Untracked: code/Microarray_SCZ_Narayan.R
Untracked: data/.DS_Store
Untracked: data/raw_data/
Untracked: data/results/
Untracked: data/working_data/
Untracked: output/MashCB_EE_Cov.rds
Untracked: output/MashCB_model_EE.rds
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | fc22b3f | zouyuxin | 2018-05-10 | wflow_publish(“analysis/MashCommonbaseline.Rmd”) |
Last updated: 2018-05-10
Code version: fc22b3f
library(limma); library(mashr); library(mclust); library(plyr);
Loading required package: ashr
Package 'mclust' version 5.4
Type 'citation("mclust")' for citing this R package in publications.
Attaching package: 'mclust'
The following object is masked from 'package:ashr':
dens
library(flashr); library(colorRamps); library(corrplot)
corrplot 0.84 loaded
Implement mash
on John’s data at time point 18.
data = readRDS('data/results/Microarray_compiledGLM.rds')
The standard errors in the data are from t distribution with df 733. Since pt(-abs(Bhat/Shat), df)
is very close to zero, it is hard to obtain the standard error from the normal distribution. The degree of freedom is large here, we use the original standard error.
mash.data = mash_set_data(Bhat = data$Chat, Shat = data$SE)
L = diag(ncol(data$Chat))
# the 4th col is CTL
L[,4] = -1
row.names(L) = colnames(data$Chat)
L = L[-4,]
mash.data.diff = mash_set_data_contrast(mash.data, L)
Top genes:
# find strong genes
m.1by1 = mash_1by1(mash.data.diff, alpha=0)
strong = get_significant_results(m.1by1)
Z = mash.data.diff$Bhat/mash.data.diff$Shat
Z.strong = Z[strong,]
# center
Z.center = apply(Z.strong, 2, function(x) x - mean(x))
Data Driven:
Flash:
flash.data = flash_set_data(Z.center)
fmodel = flash(flash.data, greedy = TRUE, backfit = TRUE)
fitting factor/loading 1
fitting factor/loading 2
fitting factor/loading 3
fitting factor/loading 4
factors = flash_get_ldf(fmodel)$f
row.names(factors) = row.names(L)
pve.order = order(flash_get_pve(fmodel), decreasing = TRUE)
par(mfrow=c(1,3))
for(i in pve.order){
barplot(factors[,i], main=paste0('Factor ',i, ' pve= ', round(flash_get_pve(fmodel)[i],3)), las=2, cex.names = 0.7)
}
par(mfrow=c(1,1))
flash
on the loading:
loading = fmodel$EL[,1:3]
colnames(loading) = paste0('Factor',seq(1,3))
flash.loading = flash_set_data(loading)
flmodel = flash(flash.loading, greedy = TRUE, backfit = TRUE)
fitting factor/loading 1
The flash prefers the rank 0 model. There is no hidden structure in the loading matrix.
Cluster loadings:
mod = Mclust(loading)
summary(mod$BIC)
Best BIC values:
VVI,9 VVE,9 VVV,9
BIC -39480.53 -39486.265635 -39619.0773
BIC diff 0.00 -5.738225 -138.5499
Using clustering result to fit mash
:
\[l_{i}\sim \sum_{j=1}^{m}N(\mu_{j}, \Sigma_{j})\] We estimate the covariance as \(F(\Sigma_j + \mu_{j}\mu_{j}')F'\).
U_list = alply(mod$parameters$variance$sigma,3)
mu_list = alply(mod$parameters$mean,2)
ll = list()
for (i in 1:length(U_list)){
ll[[i]] = U_list[[i]] + mu_list[[i]] %*% t(mu_list[[i]])
}
Factors = fmodel$EF[,1:3]
U.loading = lapply(ll, function(U){Factors %*% (U %*% t(Factors))})
names(U.loading) = paste0('Load', "_", (1:length(U.loading)))
# rank 1
Flash_res = flash_get_lf(fmodel)
U.Flash = c(mashr::cov_from_factors(t(as.matrix(factors)), "Flash"),
list("tFlash" = t(Flash_res) %*% Flash_res / nrow(Z.center)))
PCA:
U.pca = cov_pca(mash_set_data(Z.center), 3)
Canonical
U.c = cov_canonical(mash_set_data(Z.center))
Extreme Deconvolution
U.dd = c(U.pca, U.loading, U.Flash, list('XX' = t(Z.center) %*% Z.center / nrow(Z.center)))
mash.data.ed = mash.data.diff
mash.data.ed$Bhat = mash.data.diff$Bhat[strong,]
mash.data.ed$Shat = mash.data.diff$Shat[strong,]
mash.data.ed$Shat_alpha = mash.data.diff$Shat_alpha[strong,]
saveRDS(cov_ed(mash.data.ed, U.dd), 'output/MashCB_EE_Cov.rds')
U.ed = readRDS('output/MashCB_EE_Cov.rds')
saveRDS(mash(mash.data.diff, c(U.c, U.ed), algorithm.version = 'R'), 'output/MashCB_model_EE.rds')
mash.model = readRDS('output/MashCB_model_EE.rds')
The log-likelihood of fit is
get_loglik(mash.model)
[1] 63723.63
Here is a plot of weights learned:
options(repr.plot.width=12, repr.plot.height=4)
barplot(get_estimated_pi(mash.model), las = 2, cex.names = 0.7)
Check tPCA covariance matrix
x <- mash.model$fitted_g$Ulist[["ED_tPCA"]]
colnames(x) <- row.names(L)
rownames(x) <- colnames(x)
corrplot(x, method='color', cl.lim=c(-0.1,1), type='upper', addCoef.col = "black", tl.col="black", tl.srt=45, col=colorRampPalette(c("blue","white","red"))(200))
layout(matrix(c(1,2,3,4), 2, 2, byrow=TRUE))
svd.out = svd(mash.model$fitted_g$Ulist[["ED_tPCA"]])
v = svd.out$v
colnames(v) = row.names(L)
rownames(v) = colnames(v)
options(repr.plot.width=10, repr.plot.height=5)
for (j in 1:4)
barplot(v[,j]/v[,j][which.max(abs(v[,j]))], cex.names = 0.7,
las = 2, main = paste0("EigenVector ", j, " for tPCA"))
Check Load 9 covariance matrix
x <- mash.model$fitted_g$Ulist[["ED_Load_9"]]
colnames(x) <- row.names(L)
rownames(x) <- colnames(x)
corrplot(x, method='color', cl.lim=c(-0.2,1), type='upper', addCoef.col = "black", tl.col="black", tl.srt=45, col=colorRampPalette(c("blue","white","red"))(200))
layout(matrix(c(1,2,3,4), 2, 2, byrow=TRUE))
svd.out = svd(mash.model$fitted_g$Ulist[["ED_Load_9"]])
v = svd.out$v
colnames(v) = row.names(L)
rownames(v) = colnames(v)
options(repr.plot.width=10, repr.plot.height=5)
for (j in 1:4)
barplot(v[,j]/v[,j][which.max(abs(v[,j]))], cex.names = 0.7,
las = 2, main = paste0("EigenVector ", j, " for Load 9"))
Check Load 8 covariance matrix
x <- mash.model$fitted_g$Ulist[["ED_Load_8"]]
colnames(x) <- row.names(L)
rownames(x) <- colnames(x)
corrplot(x, method='color', cl.lim=c(-0.2,1), type='upper', addCoef.col = "black", tl.col="black", tl.srt=45, col=colorRampPalette(c("blue","white","red"))(200))
layout(matrix(c(1,2,3,4), 2, 2, byrow=TRUE))
svd.out = svd(mash.model$fitted_g$Ulist[["ED_Load_8"]])
v = svd.out$v
colnames(v) = row.names(L)
rownames(v) = colnames(v)
options(repr.plot.width=10, repr.plot.height=5)
for (j in 1:4)
barplot(v[,j]/v[,j][which.max(abs(v[,j]))], cex.names = 0.7,
las = 2, main = paste0("EigenVector ", j, " for Load 8"))
Check PCA1 covariance matrix
x <- mash.model$fitted_g$Ulist[["ED_PCA_1"]]
colnames(x) <- row.names(L)
rownames(x) <- colnames(x)
corrplot(x, method='color', cl.lim=c(-0.2,1), type='upper', addCoef.col = "black", tl.col="black", tl.srt=45, col=colorRampPalette(c("blue","white","red"))(200))
layout(matrix(c(1,2,3,4), 2, 2, byrow=TRUE))
svd.out = svd(mash.model$fitted_g$Ulist[["ED_PCA_1"]])
v = svd.out$v
colnames(v) = row.names(L)
rownames(v) = colnames(v)
options(repr.plot.width=10, repr.plot.height=5)
for (j in 1:4)
barplot(v[,j]/v[,j][which.max(abs(v[,j]))], cex.names = 0.7,
las = 2, main = paste0("EigenVector ", j, " for PCA 1"))
There are 5187 diferentially expressed genes.
Check pairwise sharing by magnitude and sign:
x = get_pairwise_sharing(mash.model)
x[x > 1] <- 1
x[x < -1] <- -1
colnames(x) <- row.names(L)
rownames(x) <- colnames(x)
corrplot.mixed(x, tl.pos="d",upper='color', cl.lim=c(0,1), upper.col=colorRampPalette(rev(c("#D73027","#FC8D59","#FEE090","#FFFFBF",
"#E0F3F8","#91BFDB","#4575B4")))(40),
tl.cex=1.2)
Check pairwise sharing by sign:
x = get_pairwise_sharing(mash.model, factor=0)
x[x > 1] <- 1
x[x < -1] <- -1
colnames(x) <- row.names(L)
rownames(x) <- colnames(x)
corrplot.mixed(x, tl.pos="d",upper='color', cl.lim=c(0,1), upper.col=colorRampPalette(rev(c("#D73027","#FC8D59","#FEE090","#FFFFBF",
"#E0F3F8","#91BFDB","#4575B4")))(40),
tl.cex=1.2)
sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] corrplot_0.84 colorRamps_2.3 flashr_0.5-6 plyr_1.8.4
[5] mclust_5.4 mashr_0.2-6 ashr_2.2-7 limma_3.34.9
loaded via a namespace (and not attached):
[1] Rcpp_0.12.16 pillar_1.2.2 compiler_3.4.4
[4] git2r_0.21.0 workflowr_1.0.1 R.methodsS3_1.7.1
[7] R.utils_2.6.0 iterators_1.0.9 tools_3.4.4
[10] digest_0.6.15 tibble_1.4.2 gtable_0.2.0
[13] evaluate_0.10.1 lattice_0.20-35 rlang_0.2.0
[16] Matrix_1.2-14 foreach_1.4.4 yaml_2.1.19
[19] parallel_3.4.4 mvtnorm_1.0-7 ebnm_0.1-11
[22] stringr_1.3.0 knitr_1.20 REBayes_1.3
[25] rprojroot_1.3-2 grid_3.4.4 rmarkdown_1.9
[28] rmeta_3.0 ggplot2_2.2.1 magrittr_1.5
[31] whisker_0.3-2 scales_0.5.0 backports_1.1.2
[34] codetools_0.2-15 htmltools_0.3.6 MASS_7.3-50
[37] assertthat_0.2.0 softImpute_1.4 colorspace_1.3-2
[40] stringi_1.2.2 Rmosek_8.0.69 lazyeval_0.2.1
[43] munsell_0.4.3 doParallel_1.0.11 pscl_1.5.2
[46] truncnorm_1.0-8 SQUAREM_2017.10-1 R.oo_1.22.0
sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] corrplot_0.84 colorRamps_2.3 flashr_0.5-6 plyr_1.8.4
[5] mclust_5.4 mashr_0.2-6 ashr_2.2-7 limma_3.34.9
loaded via a namespace (and not attached):
[1] Rcpp_0.12.16 pillar_1.2.2 compiler_3.4.4
[4] git2r_0.21.0 workflowr_1.0.1 R.methodsS3_1.7.1
[7] R.utils_2.6.0 iterators_1.0.9 tools_3.4.4
[10] digest_0.6.15 tibble_1.4.2 gtable_0.2.0
[13] evaluate_0.10.1 lattice_0.20-35 rlang_0.2.0
[16] Matrix_1.2-14 foreach_1.4.4 yaml_2.1.19
[19] parallel_3.4.4 mvtnorm_1.0-7 ebnm_0.1-11
[22] stringr_1.3.0 knitr_1.20 REBayes_1.3
[25] rprojroot_1.3-2 grid_3.4.4 rmarkdown_1.9
[28] rmeta_3.0 ggplot2_2.2.1 magrittr_1.5
[31] whisker_0.3-2 scales_0.5.0 backports_1.1.2
[34] codetools_0.2-15 htmltools_0.3.6 MASS_7.3-50
[37] assertthat_0.2.0 softImpute_1.4 colorspace_1.3-2
[40] stringi_1.2.2 Rmosek_8.0.69 lazyeval_0.2.1
[43] munsell_0.4.3 doParallel_1.0.11 pscl_1.5.2
[46] truncnorm_1.0-8 SQUAREM_2017.10-1 R.oo_1.22.0
This reproducible R Markdown analysis was created with workflowr 1.0.1