Last updated: 2018-05-12
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Environment: empty
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
✔ Seed:
set.seed(12345)
The command set.seed(12345)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
✔ Session information: recorded
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
✔ Repository version: ddf9062
wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.DS_Store
Ignored: analysis/BH_robustness_cache/
Ignored: analysis/FDR_Null_cache/
Ignored: analysis/FDR_null_betahat_cache/
Ignored: analysis/Rmosek_cache/
Ignored: analysis/StepDown_cache/
Ignored: analysis/alternative2_cache/
Ignored: analysis/alternative_cache/
Ignored: analysis/ash_gd_cache/
Ignored: analysis/average_cor_gtex_2_cache/
Ignored: analysis/average_cor_gtex_cache/
Ignored: analysis/brca_cache/
Ignored: analysis/cash_deconv_cache/
Ignored: analysis/cash_fdr_1_cache/
Ignored: analysis/cash_fdr_2_cache/
Ignored: analysis/cash_fdr_3_cache/
Ignored: analysis/cash_fdr_4_cache/
Ignored: analysis/cash_fdr_5_cache/
Ignored: analysis/cash_fdr_6_cache/
Ignored: analysis/cash_plots_cache/
Ignored: analysis/cash_sim_1_cache/
Ignored: analysis/cash_sim_2_cache/
Ignored: analysis/cash_sim_3_cache/
Ignored: analysis/cash_sim_4_cache/
Ignored: analysis/cash_sim_5_cache/
Ignored: analysis/cash_sim_6_cache/
Ignored: analysis/cash_sim_7_cache/
Ignored: analysis/correlated_z_2_cache/
Ignored: analysis/correlated_z_3_cache/
Ignored: analysis/correlated_z_cache/
Ignored: analysis/create_null_cache/
Ignored: analysis/cutoff_null_cache/
Ignored: analysis/design_matrix_2_cache/
Ignored: analysis/design_matrix_cache/
Ignored: analysis/diagnostic_ash_cache/
Ignored: analysis/diagnostic_correlated_z_2_cache/
Ignored: analysis/diagnostic_correlated_z_3_cache/
Ignored: analysis/diagnostic_correlated_z_cache/
Ignored: analysis/diagnostic_plot_2_cache/
Ignored: analysis/diagnostic_plot_cache/
Ignored: analysis/efron_leukemia_cache/
Ignored: analysis/fitting_normal_cache/
Ignored: analysis/gaussian_derivatives_2_cache/
Ignored: analysis/gaussian_derivatives_3_cache/
Ignored: analysis/gaussian_derivatives_4_cache/
Ignored: analysis/gaussian_derivatives_5_cache/
Ignored: analysis/gaussian_derivatives_cache/
Ignored: analysis/gd-ash_cache/
Ignored: analysis/gd_delta_cache/
Ignored: analysis/gd_lik_2_cache/
Ignored: analysis/gd_lik_cache/
Ignored: analysis/gd_w_cache/
Ignored: analysis/knockoff_10_cache/
Ignored: analysis/knockoff_2_cache/
Ignored: analysis/knockoff_3_cache/
Ignored: analysis/knockoff_4_cache/
Ignored: analysis/knockoff_5_cache/
Ignored: analysis/knockoff_6_cache/
Ignored: analysis/knockoff_7_cache/
Ignored: analysis/knockoff_8_cache/
Ignored: analysis/knockoff_9_cache/
Ignored: analysis/knockoff_cache/
Ignored: analysis/knockoff_var_cache/
Ignored: analysis/marginal_z_alternative_cache/
Ignored: analysis/marginal_z_cache/
Ignored: analysis/mosek_reg_2_cache/
Ignored: analysis/mosek_reg_4_cache/
Ignored: analysis/mosek_reg_5_cache/
Ignored: analysis/mosek_reg_6_cache/
Ignored: analysis/mosek_reg_cache/
Ignored: analysis/pihat0_null_cache/
Ignored: analysis/plot_diagnostic_cache/
Ignored: analysis/poster_obayes17_cache/
Ignored: analysis/real_data_simulation_2_cache/
Ignored: analysis/real_data_simulation_3_cache/
Ignored: analysis/real_data_simulation_4_cache/
Ignored: analysis/real_data_simulation_5_cache/
Ignored: analysis/real_data_simulation_cache/
Ignored: analysis/rmosek_primal_dual_2_cache/
Ignored: analysis/rmosek_primal_dual_cache/
Ignored: analysis/seqgendiff_cache/
Ignored: analysis/simulated_correlated_null_2_cache/
Ignored: analysis/simulated_correlated_null_3_cache/
Ignored: analysis/simulated_correlated_null_cache/
Ignored: analysis/simulation_real_se_2_cache/
Ignored: analysis/simulation_real_se_cache/
Ignored: analysis/smemo_2_cache/
Ignored: data/LSI/
Ignored: docs/.DS_Store
Ignored: docs/figure/.DS_Store
Ignored: output/fig/
Unstaged changes:
Deleted: analysis/cash_plots_fdp.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
Last updated: 2018-05-12
Code version: ddf9062
From previous simulation on FDR on correlated null we get a sense that Benjamini-Hochberg is surprisingly robust under correlation. At the same time, there are cases when BH erroneously produces a large number of false discoveries. Therefore, we take a close look at those data sets BH fails on.
z = read.table("../output/z_null_liver_777.txt")
p = read.table("../output/p_null_liver_777.txt")
n = dim(z)
pihat0 = read.table("../output/pihat0_z_null_liver_777.txt")
pihat0 = as.numeric(t(pihat0))
fd.bh = c()
for (i in 1 : n[1]) {
p_BH = p.adjust(p[i, ], method = "BH")
fd.bh[i] = sum(p_BH <= 0.05)
}
We take a look that all data sets where BH erroneously produces at least one false discovery (green), compared with \(\hat\pi_0\) produced by ash
on these same data sets.
fdn = sum(fd.bh >= 1)
I = order(fd.bh, decreasing = TRUE)[1:fdn]
x = seq(-10, 10, 0.01)
y = dnorm(x)
k = 1
m = n[2]
for (j in I) {
cat("N0.", k, ": Data Set", j, "; Number of False Discoveries:", fd.bh[j], "; pihat0 =", pihat0[j])
hist(as.numeric(z[j, ]), xlab = "z scores", freq = FALSE, ylim = c(0, 0.45), nclass = 100, main = "10000 z scores, 100 bins")
lines(x, y, col = "red")
qqnorm(as.numeric(z[j, ]), main = "Normal Q-Q plot for z scores")
abline(0, 1)
plot(sort(as.numeric(p[j, ])), xlab = "Order", ylab = "Ordered p value", main = "All p values")
abline(0, 1 / m, col = "blue")
abline(0, 0.05 / m, col = "red")
legend("top", lty = 1, col = c("blue", "red"), c("Uniform", "BH, FDR = 0.05"))
pj = sort(as.numeric(p[j, ]))
plot(pj[1:max(100, fd.bh[j])], xlab = "Order", ylab = "Ordered p value", main = "Zoom-in to 100 smallest p values", ylim = c(0, pj[max(100, fd.bh[j])]))
abline(0, 1 / m, col = "blue")
points(pj[1:fd.bh[j]], col = "green", pch = 19)
abline(0, 0.05 / m, col = "red")
legend("top", lty = 1, col = c("blue", "red"), c("Uniform", "BH, FDR = 0.05"))
plot(pj[pj <= 0.01], xlab = "Order", ylab = "Ordered p value", main = "Zoom-in to p values <= 0.01", ylim = c(0, 0.01))
abline(0, 1 / m, col = "blue")
points(pj[1:fd.bh[j]], col = "green", pch = 19)
abline(0, 0.05 / m, col = "red")
legend("top", lty = 1, col = c("blue", "red"), c("Uniform", "BH, FDR = 0.05"))
plot(pj[pj <= 0.05], xlab = "Order", ylab = "Ordered p value", main = "Zoom-in to p values <= 0.05", ylim = c(0, 0.05))
abline(0, 1 / m, col = "blue")
points(pj[1:fd.bh[j]], col = "green", pch = 19)
abline(0, 0.05 / m, col = "red")
legend("top", lty = 1, col = c("blue", "red"), c("Uniform", "BH, FDR = 0.05"))
k = k + 1
}
N0. 1 : Data Set 355 ; Number of False Discoveries: 639 ; pihat0 = 0.04750946
N0. 2 : Data Set 327 ; Number of False Discoveries: 489 ; pihat0 = 0.1522796
N0. 3 : Data Set 23 ; Number of False Discoveries: 408 ; pihat0 = 0.03139009
N0. 4 : Data Set 122 ; Number of False Discoveries: 331 ; pihat0 = 0.04301549
N0. 5 : Data Set 783 ; Number of False Discoveries: 170 ; pihat0 = 0.06208376
N0. 6 : Data Set 749 ; Number of False Discoveries: 114 ; pihat0 = 0.02583276
N0. 7 : Data Set 724 ; Number of False Discoveries: 79 ; pihat0 = 0.01606004
N0. 8 : Data Set 56 ; Number of False Discoveries: 35 ; pihat0 = 0.04829662
N0. 9 : Data Set 840 ; Number of False Discoveries: 28 ; pihat0 = 0.02316832
N0. 10 : Data Set 858 ; Number of False Discoveries: 16 ; pihat0 = 0.05110069
N0. 11 : Data Set 771 ; Number of False Discoveries: 12 ; pihat0 = 0.04316169
N0. 12 : Data Set 389 ; Number of False Discoveries: 11 ; pihat0 = 0.03156173
N0. 13 : Data Set 485 ; Number of False Discoveries: 9 ; pihat0 = 0.04794909
N0. 14 : Data Set 77 ; Number of False Discoveries: 7 ; pihat0 = 0.05151585
N0. 15 : Data Set 503 ; Number of False Discoveries: 7 ; pihat0 = 0.1545398
N0. 16 : Data Set 984 ; Number of False Discoveries: 7 ; pihat0 = 0.04567195
N0. 17 : Data Set 360 ; Number of False Discoveries: 6 ; pihat0 = 0.0308234
N0. 18 : Data Set 522 ; Number of False Discoveries: 4 ; pihat0 = 0.02083846
N0. 19 : Data Set 51 ; Number of False Discoveries: 3 ; pihat0 = 0.2435437
N0. 20 : Data Set 316 ; Number of False Discoveries: 3 ; pihat0 = 0.2961358
N0. 21 : Data Set 663 ; Number of False Discoveries: 3 ; pihat0 = 0.2818026
N0. 22 : Data Set 274 ; Number of False Discoveries: 2 ; pihat0 = 0.08580661
N0. 23 : Data Set 901 ; Number of False Discoveries: 2 ; pihat0 = 0.9997959
N0. 24 : Data Set 912 ; Number of False Discoveries: 2 ; pihat0 = 0.106048
N0. 25 : Data Set 22 ; Number of False Discoveries: 1 ; pihat0 = 0.03271534
N0. 26 : Data Set 31 ; Number of False Discoveries: 1 ; pihat0 = 0.1874129
N0. 27 : Data Set 187 ; Number of False Discoveries: 1 ; pihat0 = 0.7071115
N0. 28 : Data Set 248 ; Number of False Discoveries: 1 ; pihat0 = 0.5095841
N0. 29 : Data Set 269 ; Number of False Discoveries: 1 ; pihat0 = 0.02787007
N0. 30 : Data Set 285 ; Number of False Discoveries: 1 ; pihat0 = 0.4624418
N0. 31 : Data Set 403 ; Number of False Discoveries: 1 ; pihat0 = 0.0531628
N0. 32 : Data Set 483 ; Number of False Discoveries: 1 ; pihat0 = 0.9998824
N0. 33 : Data Set 501 ; Number of False Discoveries: 1 ; pihat0 = 0.1090034
N0. 34 : Data Set 530 ; Number of False Discoveries: 1 ; pihat0 = 0.4743284
N0. 35 : Data Set 574 ; Number of False Discoveries: 1 ; pihat0 = 0.5260185
N0. 36 : Data Set 575 ; Number of False Discoveries: 1 ; pihat0 = 0.0946117
N0. 37 : Data Set 643 ; Number of False Discoveries: 1 ; pihat0 = 0.3321723
N0. 38 : Data Set 769 ; Number of False Discoveries: 1 ; pihat0 = 0.1187511
N0. 39 : Data Set 778 ; Number of False Discoveries: 1 ; pihat0 = 0.07619716
N0. 40 : Data Set 817 ; Number of False Discoveries: 1 ; pihat0 = 0.999989
N0. 41 : Data Set 837 ; Number of False Discoveries: 1 ; pihat0 = 0.8625374
N0. 42 : Data Set 897 ; Number of False Discoveries: 1 ; pihat0 = 0.8465903
N0. 43 : Data Set 923 ; Number of False Discoveries: 1 ; pihat0 = 0.03239883
N0. 44 : Data Set 955 ; Number of False Discoveries: 1 ; pihat0 = 0.999845
N0. 45 : Data Set 971 ; Number of False Discoveries: 1 ; pihat0 = 0.2387909
N0. 46 : Data Set 997 ; Number of False Discoveries: 1 ; pihat0 = 0.09560788
sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] workflowr_1.0.1 Rcpp_0.12.16 digest_0.6.15
[4] rprojroot_1.3-2 R.methodsS3_1.7.1 backports_1.1.2
[7] git2r_0.21.0 magrittr_1.5 evaluate_0.10.1
[10] stringi_1.1.6 whisker_0.3-2 R.oo_1.21.0
[13] R.utils_2.6.0 rmarkdown_1.9 tools_3.4.3
[16] stringr_1.3.0 yaml_2.1.18 compiler_3.4.3
[19] htmltools_0.3.6 knitr_1.20
This reproducible R Markdown analysis was created with workflowr 1.0.1