Last updated: 2019-01-06
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Environment: empty
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
✔ Seed:
set.seed(1)
The command set.seed(1)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
✔ Session information: recorded
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
✔ Repository version: b04fb6c
wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.DS_Store
Ignored: analysis/.Rhistory
Ignored: analysis/include/.DS_Store
Ignored: code/.DS_Store
Ignored: data/.DS_Store
Ignored: docs/.DS_Store
Ignored: output/.DS_Store
Ignored: output/.sos/
Untracked files:
Untracked: analysis/Classify.Rmd
Untracked: analysis/EstimateCorMash.Rmd
Untracked: analysis/EstimateCorMaxGD.Rmd
Untracked: analysis/EstimateCorMaxMCMash.Rmd
Untracked: analysis/HierarchicalFlashSim.Rmd
Untracked: analysis/MashLowSignalGTEx4.Rmd
Untracked: analysis/Mash_GTEx.Rmd
Untracked: analysis/MeanAsh.Rmd
Untracked: analysis/OutlierDetection.Rmd
Untracked: analysis/OutlierDetection2.Rmd
Untracked: analysis/OutlierDetection3.Rmd
Untracked: analysis/OutlierDetection4.Rmd
Untracked: analysis/mash_missing_row.Rmd
Untracked: code/GTExNullModel.R
Untracked: code/MashClassify.R
Untracked: code/MashCorResult.R
Untracked: code/MashCormVResult.R
Untracked: code/MashNULLCorResult.R
Untracked: code/MashSource.R
Untracked: code/Weight_plot.R
Untracked: code/addemV.R
Untracked: code/dsc-differentV/
Untracked: code/dsc-differentV_signal/
Untracked: code/estimate_cor.R
Untracked: code/generateDataV.R
Untracked: code/johnprocess.R
Untracked: code/mV.R
Untracked: code/sim_mean_sig.R
Untracked: code/summary.R
Untracked: data/Blischak_et_al_2015/
Untracked: data/scale_data.rds
Untracked: data/wasp_yuxin/
Untracked: docs/figure/Classify.Rmd/
Untracked: docs/figure/OutlierDetection.Rmd/
Untracked: docs/figure/OutlierDetection2.Rmd/
Untracked: docs/figure/OutlierDetection3.Rmd/
Untracked: docs/figure/Test.Rmd/
Untracked: docs/figure/mash_missing_whole_row_5.Rmd/
Untracked: docs/include/
Untracked: output/AddEMV/
Untracked: output/CovED_UKBio_strong.rds
Untracked: output/CovED_UKBio_strong_Z.rds
Untracked: output/EstCorMLECompare/
Untracked: output/Flash_UKBio_strong.rds
Untracked: output/GTExNULLres/
Untracked: output/GTEx_2.5_nullData.rds
Untracked: output/GTEx_2.5_nullModel.rds
Untracked: output/GTEx_2.5_nullPermData.rds
Untracked: output/GTEx_2.5_nullPermModel.rds
Untracked: output/GTEx_3.5_nullData.rds
Untracked: output/GTEx_3.5_nullModel.rds
Untracked: output/GTEx_3.5_nullPermData.rds
Untracked: output/GTEx_3.5_nullPermModel.rds
Untracked: output/GTEx_3_nullData.rds
Untracked: output/GTEx_3_nullModel.rds
Untracked: output/GTEx_3_nullPermData.rds
Untracked: output/GTEx_3_nullPermModel.rds
Untracked: output/GTEx_4.5_nullData.rds
Untracked: output/GTEx_4.5_nullModel.rds
Untracked: output/GTEx_4.5_nullPermData.rds
Untracked: output/GTEx_4.5_nullPermModel.rds
Untracked: output/GTEx_4_nullData.rds
Untracked: output/GTEx_4_nullModel.rds
Untracked: output/GTEx_4_nullPermData.rds
Untracked: output/GTEx_4_nullPermModel.rds
Untracked: output/MASH.10.em2.result.rds
Untracked: output/MASH.10.mle.result.rds
Untracked: output/MashCorSim--midway/
Untracked: output/Mash_EE_Cov_0_plusR1.rds
Untracked: output/UKBio_mash_model.rds
Untracked: output/WASP/
Untracked: output/diff_v/
Untracked: output/diff_v_signal/
Untracked: output/dsc-mashr-est_v/
Untracked: output/mVIterations/
Untracked: output/mVMLEsubset/
Untracked: output/mVUlist/
Untracked: output/result.em.rds
Unstaged changes:
Modified: analysis/EstimateCorMaxMVSample.Rmd
Modified: analysis/Mash_UKBio.Rmd
Modified: analysis/mash_missing_samplesize.Rmd
Modified: output/Flash_T2_0.rds
Modified: output/Flash_T2_0_mclust.rds
Modified: output/Mash_model_0_plusR1.rds
Modified: output/PresiAddVarCol.rds
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | b04fb6c | zouyuxin | 2019-01-06 | wflow_publish(“analysis/WASPmash.Rmd”) |
Loading required package: ashr
dat = readRDS('../data/wasp_yuxin/fastqtl_to_mash_output/wasp.mash.rds')
dat$strong.z[is.infinite(dat$strong.z)] = sign(dat$strong.z[is.infinite(dat$strong.z)]) * 10
dat$random.z[is.infinite(dat$random.z)] = sign(dat$random.z[is.infinite(dat$random.z)]) * 10
dat$strong.z = dat$strong.z[,c(1,8:16,2:7)]
dat$random.z = dat$random.z[,c(1,8:16,2:7)]
data.random = mash_set_data(dat$random.z)
data.strong = mash_set_data(dat$strong.z)
Flash:
my_init_fn <- function(Y, K = 1) {
ret = flashr:::udv_si(Y, K)
pos_sum = sum(ret$v[ret$v > 0])
neg_sum = -sum(ret$v[ret$v < 0])
if (neg_sum > pos_sum) {
return(list(u = -ret$u, d = ret$d, v = -ret$v))
} else
return(ret)
}
flash_pipeline = function(data, ...) {
## current state-of-the art
## suggested by Jason Willwerscheid
## cf: discussion section of
## https://willwerscheid.github.io/MASHvFLASH/MASHvFLASHnn2.html
ebnm_fn = "ebnm_ash"
ebnm_param = list(l = list(mixcompdist = "normal",
optmethod = "mixSQP"),
f = list(mixcompdist = "+uniform",
optmethod = "mixSQP"))
##
fl_g <- flashr:::flash_greedy_workhorse(data,
var_type = "constant",
ebnm_fn = ebnm_fn,
ebnm_param = ebnm_param,
init_fn = "my_init_fn",
stopping_rule = "factors",
tol = 1e-3,
verbose_output = "odF")
fl_b <- flashr:::flash_backfit_workhorse(data,
f_init = fl_g,
var_type = "constant",
ebnm_fn = ebnm_fn,
ebnm_param = ebnm_param,
stopping_rule = "factors",
tol = 1e-3,
verbose_output = "odF")
return(fl_b)
}
cov_flash = function(data, subset = NULL, non_canonical = FALSE, save_model = NULL) {
if(is.null(subset)) subset = 1:mashr:::n_effects(data)
b.center = apply(data$Bhat, 2, function(x) x - mean(x))
## Only keep factors with at least two values greater than 1 / sqrt(n)
find_nonunique_effects <- function(fl) {
thresh <- 1/sqrt(ncol(fl$fitted_values))
vals_above_avg <- colSums(fl$ldf$f > thresh)
nonuniq_effects <- which(vals_above_avg > 1)
return(fl$ldf$f[, nonuniq_effects, drop = FALSE])
}
fmodel = flash_pipeline(b.center)
if (non_canonical)
flash_f = find_nonunique_effects(fmodel)
else
flash_f = fmodel$ldf$f
## row.names(flash_f) = colnames(b)
if (!is.null(save_model)) saveRDS(list(model=fmodel, factors=flash_f), save_model)
if(ncol(flash_f) == 0){
U.flash = list("tFLASH" = t(fmodel$fitted_values) %*% fmodel$fitted_values / nrow(fmodel$fitted_values))
} else{
U.flash = c(cov_from_factors(t(as.matrix(flash_f)), "FLASH"),
list("tFLASH" = t(fmodel$fitted_values) %*% fmodel$fitted_values / nrow(fmodel$fitted_values)))
}
return(U.flash)
}
U.f = cov_flash(data.strong, non_canonical = TRUE, save_model = '../output/WASP/flash_model.rds')
saveRDS(U.f, '../output/WASP/flash_cov.rds')
fl_model = readRDS('../output/WASP/flash_model.rds')$model
factors = readRDS('../output/WASP/flash_model.rds')$factors
par(mfrow = c(1, 3))
for(k in 1:3){
barplot(factors[,k], main=paste0("Factor ", k), names.arg = 0:15)
}
fll_model = flash_pipeline(fl_model$ldf$l)
saveRDS(fll_model, '../output/WASP/flash_loading_model.rds')
U.pca = cov_pca(data.strong, 5)
U.ed = cov_ed(data.strong, c(U.f, U.pca))
U.ed = readRDS('../output/WASP/Ued.rds')
U.c = cov_canonical(data.random)
m.ignore = mash(data.random, c(U.c, U.ed), outputlevel = 1)
m.ignore$result = mash_compute_posterior_matrices(m.ignore, data.strong)
V.simple = estimate_null_correlation_simple(data.random)
data.random.V.simple = mash_update_data(data.random, V = V.simple)
m.simple = mash(data.random.V.simple, c(U.c, U.ed), outputlevel = 1)
data.strong.V.simple = mash_update_data(data.strong, V = V.simple)
m.simple$result = mash_compute_posterior_matrices(m.simple, data.strong.V.simple)
set.seed(1)
random.subset = sample(1:nrow(gtex$random.b),5000)
data.random.s = mash_set_data(gtex$random.b[random.subset,], gtex$random.s[random.subset,])
current = estimate_null_correlation(data.random.s, c(U.c, U.ed), max_iter = 20)
V.current = current$V
data.random.V.current = mash_update_data(data.random, V = V.current)
m.current = mash(data.random.V.current, c(U.c, U.ed), outputlevel = 1)
data.strong = mash_update_data(data.strong, V = V.current)
m.current$result = mash_compute_posterior_matrices(m.current, data.strong)
# read model
m_ignore = readRDS('../output/WASP/m_ignore_post.rds')
m_simple = readRDS('../output/WASP/m_simple_post.rds')
m_current = readRDS('../output/WASP/m_current_post.rds')
colnames(V.simple) = 0:15
row.names(V.simple) = 0:15
corrplot::corrplot(V.simple, method='color', type='upper', tl.col="black", tl.srt=45, tl.cex = 0.7, diag = FALSE, col=colorRampPalette(c("blue", "white", "red"))(200), cl.lim = c(-1,1), title = 'Simple', mar=c(0,0,5,0))
V.current = readRDS('../output/WASP/currentV.rds')
V.current = V.current$V
colnames(V.current) = 0:15
row.names(V.current) = 0:15
corrplot::corrplot(V.current, method='color', type='upper', tl.col="black", tl.srt=45, tl.cex = 0.7, diag = FALSE, col=colorRampPalette(c("blue", "white", "red"))(200), cl.lim = c(-1,1), title = 'Current', mar=c(0,0,5,0))
tmp = cbind(c(get_loglik(m_ignore), get_loglik(m_simple), get_loglik(m_current)))
row.names(tmp) = c('Ignore', 'Simple', 'Current')
colnames(tmp) = 'log likelihood'
tmp %>% kable() %>% kable_styling()
log likelihood | |
---|---|
Ignore | -442320.6 |
Simple | -429581.7 |
Current | -428237.2 |
par(mfrow=c(1,3))
barplot(get_estimated_pi(m_ignore), las=2, cex.names = 0.7, main = 'Ignore')
barplot(get_estimated_pi(m_simple), las=2, cex.names = 0.7, main = 'Simple')
barplot(get_estimated_pi(m_current), las=2, cex.names = 0.7, main = 'Current')
Number of significant:
numsig = c(length(get_significant_results(m_ignore)),
length(get_significant_results(m_simple)),
length(get_significant_results(m_current)))
tmp = cbind(numsig)
row.names(tmp) = c('Ignore', 'Simple', 'Current')
colnames(tmp) = c('# significance')
tmp %>% kable() %>% kable_styling()
# significance | |
---|---|
Ignore | 5872 |
Simple | 2983 |
Current | 1617 |
The intersection of significance results:
length(intersect(get_significant_results(m_simple), get_significant_results(m_current)))
[1] 1526
length(intersect(get_significant_results(m_ignore), get_significant_results(m_simple)))
[1] 2981
length(intersect(get_significant_results(m_current), get_significant_results(m_ignore)))
[1] 1593
stronggene = data.frame(dat$strong.z[739,])
colnames(stronggene) = 'EffectSize'
stronggene$Group = 0:15
stronggene$se = dat$strong.s[739,]
p1 = ggplot(stronggene, aes(y = EffectSize, x = Group)) +
geom_point(show.legend = FALSE) + coord_flip() + ggtitle('ENSG00000085491') + ylim(c(-10,-2)) + geom_errorbar(aes(ymin=EffectSize-1.96*se, ymax=EffectSize+1.96*se), width=0.4, show.legend = FALSE) +
theme_bw(base_size=12) + theme(axis.text.y = element_text(size = 6))
stronggeneSimple = data.frame(m_simple$result$PosteriorMean[739,])
colnames(stronggeneSimple) = 'EffectSize'
stronggeneSimple$Group = 0:15
stronggeneSimple$se = m_simple$result$PosteriorSD[739,]
p2 = ggplot(stronggeneSimple, aes(y = EffectSize, x = Group)) +
geom_point(show.legend = FALSE) + coord_flip() + ggtitle('ENSG00000085491 Simple') + ylim(c(-10,-2)) +
geom_errorbar(aes(ymin=EffectSize-1.96*se, ymax=EffectSize+1.96*se), width=0.4, show.legend = FALSE) +
theme_bw(base_size=12) + theme(axis.text.y = element_text(size = 6))
stronggeneCurrent = data.frame(m_current$result$PosteriorMean[739,])
colnames(stronggeneCurrent) = 'EffectSize'
stronggeneCurrent$Group = 0:15
stronggeneCurrent$se = m_current$result$PosteriorSD[739,]
p3 = ggplot(stronggeneCurrent, aes(y = EffectSize, x = Group)) +
geom_point(show.legend = FALSE) + ylim(c(-10,-2)) + coord_flip() + ggtitle('ENSG00000085491 Current') +
geom_errorbar(aes(ymin=EffectSize-1.96*se, ymax=EffectSize+1.96*se), width=0.4, show.legend = FALSE) +
theme_bw(base_size=12) + theme(axis.text.y = element_text(size = 6))
grid.arrange(p1, p2, p3, nrow = 1)
The gene significant in simple
, not in current
stronggene = data.frame(dat$strong.z[5111,])
colnames(stronggene) = 'EffectSize'
stronggene$Group = 0:15
stronggene$se = dat$strong.s[5111,]
p1 = ggplot(stronggene, aes(y = EffectSize, x = Group)) +
geom_point(show.legend = FALSE) + coord_flip() + ggtitle('ENSG00000173473') + ylim(c(-6,3)) + geom_errorbar(aes(ymin=EffectSize-1.96*se, ymax=EffectSize+1.96*se), width=0.4, show.legend = FALSE) +
theme_bw(base_size=12) + theme(axis.text.y = element_text(size = 6))
stronggeneSimple = data.frame(m_simple$result$PosteriorMean[5111,])
colnames(stronggeneSimple) = 'EffectSize'
stronggeneSimple$Group = 0:15
stronggeneSimple$se = m_simple$result$PosteriorSD[5111,]
p2 = ggplot(stronggeneSimple, aes(y = EffectSize, x = Group)) +
geom_point(show.legend = FALSE) + coord_flip() + ggtitle('ENSG00000173473 Simple') + ylim(c(-6,3)) +
geom_errorbar(aes(ymin=EffectSize-1.96*se, ymax=EffectSize+1.96*se), width=0.4, show.legend = FALSE) +
theme_bw(base_size=12) + theme(axis.text.y = element_text(size = 6))
stronggeneCurrent = data.frame(m_current$result$PosteriorMean[5111,])
colnames(stronggeneCurrent) = 'EffectSize'
stronggeneCurrent$Group = 0:15
stronggeneCurrent$se = m_current$result$PosteriorSD[5111,]
p3 = ggplot(stronggeneCurrent, aes(y = EffectSize, x = Group)) +
geom_point(show.legend = FALSE) + ylim(c(-6,3)) + coord_flip() + ggtitle('ENSG00000173473 Current') +
geom_errorbar(aes(ymin=EffectSize-1.96*se, ymax=EffectSize+1.96*se), width=0.4, show.legend = FALSE) +
theme_bw(base_size=12) + theme(axis.text.y = element_text(size = 6))
grid.arrange(p1, p2, p3, nrow = 1)
The pairwise sharing by magnitude
x <- get_pairwise_sharing(m_ignore)
colnames(x) <- 0:15
rownames(x) <- 0:15
clrs=colorRampPalette(rev(c('darkred', 'red','orange','yellow','cadetblue1', 'cyan', 'dodgerblue4', 'blue','darkorchid1','lightgreen','green', 'forestgreen','darkolivegreen')))(200)
corrplot::corrplot(x, method='color', type='upper', tl.col="black", tl.srt=45, tl.cex = 0.7, diag = FALSE, col=clrs, cl.lim = c(0,1), title = 'Ignore', mar=c(0,0,5,0))
x <- get_pairwise_sharing(m_simple)
colnames(x) <- 0:15
rownames(x) <- 0:15
corrplot::corrplot(x, method='color', type='upper', tl.col="black", tl.srt=45, tl.cex = 0.7, diag = FALSE, col=clrs, cl.lim = c(0,1), title = 'Simple', mar=c(0,0,5,0))
x <- get_pairwise_sharing(m_current)
colnames(x) <- 0:15
rownames(x) <- 0:15
corrplot::corrplot(x, method='color', type='upper', tl.col="black", tl.srt=45, tl.cex = 0.7, diag = FALSE, col=clrs, cl.lim = c(0,1), title = 'Current', mar=c(0,0,5,0))
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.14.2
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] gridExtra_2.3 ggplot2_3.1.0 kableExtra_0.9.0 knitr_1.20
[5] mashr_0.2.19.0555 ashr_2.2-26 mixsqp_0.1-93 flashr_0.6-3
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 mvtnorm_1.0-8 lattice_0.20-35
[4] assertthat_0.2.0 rprojroot_1.3-2 digest_0.6.18
[7] foreach_1.4.4 truncnorm_1.0-8 R6_2.3.0
[10] plyr_1.8.4 backports_1.1.2 evaluate_0.12
[13] httr_1.3.1 highr_0.7 pillar_1.3.1
[16] rlang_0.3.0.1 lazyeval_0.2.1 pscl_1.5.2
[19] rstudioapi_0.8 whisker_0.3-2 R.utils_2.7.0
[22] R.oo_1.22.0 Matrix_1.2-14 rmarkdown_1.10
[25] labeling_0.3 readr_1.1.1 stringr_1.3.1
[28] munsell_0.5.0 compiler_3.5.1 pkgconfig_2.0.2
[31] SQUAREM_2017.10-1 htmltools_0.3.6 tidyselect_0.2.5
[34] tibble_1.4.2 workflowr_1.1.1 codetools_0.2-15
[37] viridisLite_0.3.0 crayon_1.3.4 dplyr_0.7.6
[40] withr_2.1.2 MASS_7.3-50 R.methodsS3_1.7.1
[43] grid_3.5.1 gtable_0.2.0 git2r_0.23.0
[46] magrittr_1.5 scales_1.0.0 stringi_1.2.4
[49] reshape2_1.4.3 doParallel_1.0.14 bindrcpp_0.2.2
[52] xml2_1.2.0 rmeta_3.0 iterators_1.0.10
[55] tools_3.5.1 glue_1.3.0 softImpute_1.4
[58] purrr_0.2.5 hms_0.4.2 abind_1.4-5
[61] parallel_3.5.1 yaml_2.2.0 colorspace_1.3-2
[64] rvest_0.3.2 corrplot_0.84 bindr_0.1.1
This reproducible R Markdown analysis was created with workflowr 1.1.1