Histogram of correlated z scores, Part 3

Last updated: 2018-05-12

workflowr checks: (Click a bullet for more information)

✔ R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
✔ Seed: set.seed(12345)

The command set.seed(12345) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
✔ Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

✔ Repository version: ddf9062

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.DS_Store
    Ignored:    analysis/BH_robustness_cache/
    Ignored:    analysis/FDR_Null_cache/
    Ignored:    analysis/FDR_null_betahat_cache/
    Ignored:    analysis/Rmosek_cache/
    Ignored:    analysis/StepDown_cache/
    Ignored:    analysis/alternative2_cache/
    Ignored:    analysis/alternative_cache/
    Ignored:    analysis/ash_gd_cache/
    Ignored:    analysis/average_cor_gtex_2_cache/
    Ignored:    analysis/average_cor_gtex_cache/
    Ignored:    analysis/brca_cache/
    Ignored:    analysis/cash_deconv_cache/
    Ignored:    analysis/cash_fdr_1_cache/
    Ignored:    analysis/cash_fdr_2_cache/
    Ignored:    analysis/cash_fdr_3_cache/
    Ignored:    analysis/cash_fdr_4_cache/
    Ignored:    analysis/cash_fdr_5_cache/
    Ignored:    analysis/cash_fdr_6_cache/
    Ignored:    analysis/cash_plots_cache/
    Ignored:    analysis/cash_sim_1_cache/
    Ignored:    analysis/cash_sim_2_cache/
    Ignored:    analysis/cash_sim_3_cache/
    Ignored:    analysis/cash_sim_4_cache/
    Ignored:    analysis/cash_sim_5_cache/
    Ignored:    analysis/cash_sim_6_cache/
    Ignored:    analysis/cash_sim_7_cache/
    Ignored:    analysis/correlated_z_2_cache/
    Ignored:    analysis/correlated_z_3_cache/
    Ignored:    analysis/correlated_z_cache/
    Ignored:    analysis/create_null_cache/
    Ignored:    analysis/cutoff_null_cache/
    Ignored:    analysis/design_matrix_2_cache/
    Ignored:    analysis/design_matrix_cache/
    Ignored:    analysis/diagnostic_ash_cache/
    Ignored:    analysis/diagnostic_correlated_z_2_cache/
    Ignored:    analysis/diagnostic_correlated_z_3_cache/
    Ignored:    analysis/diagnostic_correlated_z_cache/
    Ignored:    analysis/diagnostic_plot_2_cache/
    Ignored:    analysis/diagnostic_plot_cache/
    Ignored:    analysis/efron_leukemia_cache/
    Ignored:    analysis/fitting_normal_cache/
    Ignored:    analysis/gaussian_derivatives_2_cache/
    Ignored:    analysis/gaussian_derivatives_3_cache/
    Ignored:    analysis/gaussian_derivatives_4_cache/
    Ignored:    analysis/gaussian_derivatives_5_cache/
    Ignored:    analysis/gaussian_derivatives_cache/
    Ignored:    analysis/gd-ash_cache/
    Ignored:    analysis/gd_delta_cache/
    Ignored:    analysis/gd_lik_2_cache/
    Ignored:    analysis/gd_lik_cache/
    Ignored:    analysis/gd_w_cache/
    Ignored:    analysis/knockoff_10_cache/
    Ignored:    analysis/knockoff_2_cache/
    Ignored:    analysis/knockoff_3_cache/
    Ignored:    analysis/knockoff_4_cache/
    Ignored:    analysis/knockoff_5_cache/
    Ignored:    analysis/knockoff_6_cache/
    Ignored:    analysis/knockoff_7_cache/
    Ignored:    analysis/knockoff_8_cache/
    Ignored:    analysis/knockoff_9_cache/
    Ignored:    analysis/knockoff_cache/
    Ignored:    analysis/knockoff_var_cache/
    Ignored:    analysis/marginal_z_alternative_cache/
    Ignored:    analysis/marginal_z_cache/
    Ignored:    analysis/mosek_reg_2_cache/
    Ignored:    analysis/mosek_reg_4_cache/
    Ignored:    analysis/mosek_reg_5_cache/
    Ignored:    analysis/mosek_reg_6_cache/
    Ignored:    analysis/mosek_reg_cache/
    Ignored:    analysis/pihat0_null_cache/
    Ignored:    analysis/plot_diagnostic_cache/
    Ignored:    analysis/poster_obayes17_cache/
    Ignored:    analysis/real_data_simulation_2_cache/
    Ignored:    analysis/real_data_simulation_3_cache/
    Ignored:    analysis/real_data_simulation_4_cache/
    Ignored:    analysis/real_data_simulation_5_cache/
    Ignored:    analysis/real_data_simulation_cache/
    Ignored:    analysis/rmosek_primal_dual_2_cache/
    Ignored:    analysis/rmosek_primal_dual_cache/
    Ignored:    analysis/seqgendiff_cache/
    Ignored:    analysis/simulated_correlated_null_2_cache/
    Ignored:    analysis/simulated_correlated_null_3_cache/
    Ignored:    analysis/simulated_correlated_null_cache/
    Ignored:    analysis/simulation_real_se_2_cache/
    Ignored:    analysis/simulation_real_se_cache/
    Ignored:    analysis/smemo_2_cache/
    Ignored:    data/LSI/
    Ignored:    docs/.DS_Store
    Ignored:    docs/figure/.DS_Store
    Ignored:    output/fig/

Unstaged changes:
    Deleted:    analysis/cash_plots_fdp.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

Expand here to see past versions:

File	Version	Author	Date	Message
Rmd	cc0ab83	Lei Sun	2018-05-11	update
html	0f36d99	LSun	2017-12-21	Build site.
html	853a484	LSun	2017-11-07	Build site.
Rmd	bfde8e5	LSun	2017-03-10	theoretical
html	bfde8e5	LSun	2017-03-10	theoretical

Last updated: 2018-05-12

Code version: ddf9062

Introduction

From previous simulation on FDR on correlated null we get a sense that Benjamini-Hochberg is surprisingly robust under correlation. At the same time, there are cases when BH erroneously produces a large number of false discoveries. Therefore, we take a close look at those data sets BH fails on.

Simulation

z = read.table("../output/z_null_liver_777.txt")
p = read.table("../output/p_null_liver_777.txt")
n = dim(z)

pihat0 = read.table("../output/pihat0_z_null_liver_777.txt")
pihat0 = as.numeric(t(pihat0))

fd.bh = c()

for (i in 1 : n[1]) {
  p_BH = p.adjust(p[i, ], method = "BH")
  fd.bh[i] = sum(p_BH <= 0.05)
}

Result

We take a look that all data sets where BH erroneously produces at least one false discovery (green), compared with \(\hat\pi_0\) produced by ash on these same data sets.

fdn = sum(fd.bh >= 1)
I = order(fd.bh, decreasing = TRUE)[1:fdn]
x = seq(-10, 10, 0.01)
y = dnorm(x)
k = 1
m = n[2]
for (j in I) {
  cat("N0.", k, ": Data Set", j, "; Number of False Discoveries:", fd.bh[j], "; pihat0 =", pihat0[j])
  hist(as.numeric(z[j, ]), xlab = "z scores", freq = FALSE, ylim = c(0, 0.45), nclass = 100, main = "10000 z scores, 100 bins")
  lines(x, y, col = "red")
  qqnorm(as.numeric(z[j, ]), main = "Normal Q-Q plot for z scores")
  abline(0, 1)
  plot(sort(as.numeric(p[j, ])), xlab = "Order", ylab = "Ordered p value", main = "All p values")
  abline(0, 1 / m, col = "blue")
  abline(0, 0.05 / m, col = "red")
  legend("top", lty = 1, col = c("blue", "red"), c("Uniform", "BH, FDR = 0.05"))
  
  pj = sort(as.numeric(p[j, ]))

  plot(pj[1:max(100, fd.bh[j])], xlab = "Order", ylab = "Ordered p value", main = "Zoom-in to 100 smallest p values", ylim = c(0, pj[max(100, fd.bh[j])]))
  abline(0, 1 / m, col = "blue")
  points(pj[1:fd.bh[j]], col = "green", pch = 19)
  abline(0, 0.05 / m, col = "red")
  legend("top", lty = 1, col = c("blue", "red"), c("Uniform", "BH, FDR = 0.05"))
  
  plot(pj[pj <= 0.01], xlab = "Order", ylab = "Ordered p value", main = "Zoom-in to p values <= 0.01", ylim = c(0, 0.01))
  abline(0, 1 / m, col = "blue")
  points(pj[1:fd.bh[j]], col = "green", pch = 19)
  abline(0, 0.05 / m, col = "red")
  legend("top", lty = 1, col = c("blue", "red"), c("Uniform", "BH, FDR = 0.05"))
  
  plot(pj[pj <= 0.05], xlab = "Order", ylab = "Ordered p value", main = "Zoom-in to p values <= 0.05", ylim = c(0, 0.05))
  abline(0, 1 / m, col = "blue")
  points(pj[1:fd.bh[j]], col = "green", pch = 19)
  abline(0, 0.05 / m, col = "red")
  legend("top", lty = 1, col = c("blue", "red"), c("Uniform", "BH, FDR = 0.05"))
  
  k = k + 1
}

N0. 1 : Data Set 355 ; Number of False Discoveries: 639 ; pihat0 = 0.04750946

N0. 2 : Data Set 327 ; Number of False Discoveries: 489 ; pihat0 = 0.1522796

N0. 3 : Data Set 23 ; Number of False Discoveries: 408 ; pihat0 = 0.03139009

N0. 4 : Data Set 122 ; Number of False Discoveries: 331 ; pihat0 = 0.04301549

N0. 5 : Data Set 783 ; Number of False Discoveries: 170 ; pihat0 = 0.06208376

N0. 6 : Data Set 749 ; Number of False Discoveries: 114 ; pihat0 = 0.02583276

N0. 7 : Data Set 724 ; Number of False Discoveries: 79 ; pihat0 = 0.01606004

N0. 8 : Data Set 56 ; Number of False Discoveries: 35 ; pihat0 = 0.04829662

N0. 9 : Data Set 840 ; Number of False Discoveries: 28 ; pihat0 = 0.02316832

N0. 10 : Data Set 858 ; Number of False Discoveries: 16 ; pihat0 = 0.05110069

N0. 11 : Data Set 771 ; Number of False Discoveries: 12 ; pihat0 = 0.04316169

N0. 12 : Data Set 389 ; Number of False Discoveries: 11 ; pihat0 = 0.03156173

N0. 13 : Data Set 485 ; Number of False Discoveries: 9 ; pihat0 = 0.04794909

N0. 14 : Data Set 77 ; Number of False Discoveries: 7 ; pihat0 = 0.05151585

N0. 15 : Data Set 503 ; Number of False Discoveries: 7 ; pihat0 = 0.1545398

N0. 16 : Data Set 984 ; Number of False Discoveries: 7 ; pihat0 = 0.04567195

N0. 17 : Data Set 360 ; Number of False Discoveries: 6 ; pihat0 = 0.0308234

N0. 18 : Data Set 522 ; Number of False Discoveries: 4 ; pihat0 = 0.02083846

N0. 19 : Data Set 51 ; Number of False Discoveries: 3 ; pihat0 = 0.2435437

N0. 20 : Data Set 316 ; Number of False Discoveries: 3 ; pihat0 = 0.2961358

N0. 21 : Data Set 663 ; Number of False Discoveries: 3 ; pihat0 = 0.2818026

N0. 22 : Data Set 274 ; Number of False Discoveries: 2 ; pihat0 = 0.08580661

N0. 23 : Data Set 901 ; Number of False Discoveries: 2 ; pihat0 = 0.9997959

N0. 24 : Data Set 912 ; Number of False Discoveries: 2 ; pihat0 = 0.106048

N0. 25 : Data Set 22 ; Number of False Discoveries: 1 ; pihat0 = 0.03271534

N0. 26 : Data Set 31 ; Number of False Discoveries: 1 ; pihat0 = 0.1874129

N0. 27 : Data Set 187 ; Number of False Discoveries: 1 ; pihat0 = 0.7071115

N0. 28 : Data Set 248 ; Number of False Discoveries: 1 ; pihat0 = 0.5095841

N0. 29 : Data Set 269 ; Number of False Discoveries: 1 ; pihat0 = 0.02787007

N0. 30 : Data Set 285 ; Number of False Discoveries: 1 ; pihat0 = 0.4624418

N0. 31 : Data Set 403 ; Number of False Discoveries: 1 ; pihat0 = 0.0531628

N0. 32 : Data Set 483 ; Number of False Discoveries: 1 ; pihat0 = 0.9998824

N0. 33 : Data Set 501 ; Number of False Discoveries: 1 ; pihat0 = 0.1090034

N0. 34 : Data Set 530 ; Number of False Discoveries: 1 ; pihat0 = 0.4743284

N0. 35 : Data Set 574 ; Number of False Discoveries: 1 ; pihat0 = 0.5260185

N0. 36 : Data Set 575 ; Number of False Discoveries: 1 ; pihat0 = 0.0946117

N0. 37 : Data Set 643 ; Number of False Discoveries: 1 ; pihat0 = 0.3321723

N0. 38 : Data Set 769 ; Number of False Discoveries: 1 ; pihat0 = 0.1187511

N0. 39 : Data Set 778 ; Number of False Discoveries: 1 ; pihat0 = 0.07619716

N0. 40 : Data Set 817 ; Number of False Discoveries: 1 ; pihat0 = 0.999989

N0. 41 : Data Set 837 ; Number of False Discoveries: 1 ; pihat0 = 0.8625374

N0. 42 : Data Set 897 ; Number of False Discoveries: 1 ; pihat0 = 0.8465903

N0. 43 : Data Set 923 ; Number of False Discoveries: 1 ; pihat0 = 0.03239883

N0. 44 : Data Set 955 ; Number of False Discoveries: 1 ; pihat0 = 0.999845

N0. 45 : Data Set 971 ; Number of False Discoveries: 1 ; pihat0 = 0.2387909

N0. 46 : Data Set 997 ; Number of False Discoveries: 1 ; pihat0 = 0.09560788

Session Information

Session information

sessionInfo()

R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] workflowr_1.0.1   Rcpp_0.12.16      digest_0.6.15    
 [4] rprojroot_1.3-2   R.methodsS3_1.7.1 backports_1.1.2  
 [7] git2r_0.21.0      magrittr_1.5      evaluate_0.10.1  
[10] stringi_1.1.6     whisker_0.3-2     R.oo_1.21.0      
[13] R.utils_2.6.0     rmarkdown_1.9     tools_3.4.3      
[16] stringr_1.3.0     yaml_2.1.18       compiler_3.4.3   
[19] htmltools_0.3.6   knitr_1.20

This reproducible R Markdown analysis was created with workflowr 1.0.1

Histogram of correlated \(z\) scores, Part 3

Lei Sun

2017-03-10

Introduction

Simulation

Result

Session Information

Session information