We’ve shown that in many real data sets when we have correlated null \(z\) scores, we can fit their empirical distribution with Gaussian and its derivatives.
But what if we have real signals instead of the global null? Can we fit real signals with Gaussian derivatives with reasonable weights? Let’s start with the most basic case: \(z \sim N(0, \sqrt{2}^2)\) independently. This data set can be seen as generated as follows.
\[ \begin{array}{c} \beta_j \sim N(0, 1)\\ z_j \sim N(\beta_j, 1) \end{array} \]
Let normalized weights \(W_k^s = W_k\sqrt{k!}\) with variance \(\text{var}(W_k^s) = \alpha_k = \bar{\rho_{ij}^k}\).
n = 1e4
m = 5
set.seed(777)
zmat = matrix(rnorm(n * m, 0, sd = sqrt(2)), nrow = m)
library(ashr)
source("../code/ecdfz.R")
res = list()
for (i in 1:3) {
z = zmat[i, ]
p = (1 - pnorm(abs(z))) * 2
bh.fd = sum(p.adjust(p, method = "BH") <= 0.05)
pihat0.ash = get_pi0(ash(z, 1, method = "fdr"))
ecdfz.fit = ecdfz.optimal(z)
res[[i]] = list(z = z, p = p, bh.fd = bh.fd, pihat0.ash = pihat0.ash, ecdfz.fit = ecdfz.fit)
}
Number of Discoveries: 207 ; pihat0 = 0.3309512
Log-likelihood with N(0, 2): -17629.27
Log-likelihood with Gaussian Derivatives: -17634.06
Log-likelihood ratio between true N(0, 2) and fitted Gaussian derivatives: 4.790331
Normalized weights:
1 : -0.0177035771860527 ; 2 : 0.689887136368602 ; 3 : -0.0509478089605657 ; 4 : 0.474815572091947 ; 5 : -0.0255292356243276 ; 6 : 0.197260355735445 ; 7 : 0.0209642526780779 ;
Zoom in to the right tails:
Zoom in to the left tails:
Number of Discoveries: 210 ; pihat0 = 0.3484933
Log-likelihood with N(0, 2): -17695.54
Log-likelihood with Gaussian Derivatives: -17691.84
Log-likelihood ratio between true N(0, 2) and fitted Gaussian derivatives: -3.698983
Normalized weights:
1 : -0.00662167797406594 ; 2 : 0.716277504382886 ; 3 : -0.0310338589748257 ; 4 : 0.594319021051848 ; 5 : -0.00405607696274195 ; 6 : 0.364148970093227 ; 7 : 0.0347791611900783 ; 8 : 0.0862488827136366 ;
Zoom in to the right tails:
Zoom in to the left tails:
Number of Discoveries: 190 ; pihat0 = 0.3438938
Log-likelihood with N(0, 2): -17620.41
Log-likelihood with Gaussian Derivatives: -17620.47
Log-likelihood ratio between true N(0, 2) and fitted Gaussian derivatives: 0.05853415
Normalized weights:
1 : 0.00412988146644346 ; 2 : 0.691757356275142 ; 3 : 0.0624186506260473 ; 4 : 0.566017599686127 ; 5 : 0.0720236790272216 ; 6 : 0.359005476183793 ; 7 : 0.0491535934289797 ; 8 : 0.11765212650001 ;
Zoom in to the right tails:
Zoom in to the left tails:
sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.3
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] cvxr_0.0.0.9009 EQL_1.0-0 ttutils_1.0-1 ashr_2.1.5
loaded via a namespace (and not attached):
[1] Rcpp_0.12.10 knitr_1.15.1 magrittr_1.5
[4] MASS_7.3-45 doParallel_1.0.10 pscl_1.4.9
[7] SQUAREM_2016.10-1 lattice_0.20-34 foreach_1.4.3
[10] stringr_1.1.0 tools_3.3.2 parallel_3.3.2
[13] grid_3.3.2 htmltools_0.3.5 iterators_1.0.8
[16] yaml_2.1.14 rprojroot_1.2 digest_0.6.9
[19] Matrix_1.2-7.1 codetools_0.2-15 evaluate_0.10
[22] rmarkdown_1.3 stringi_1.1.2 backports_1.0.5
[25] truncnorm_1.0-7
This R Markdown site was created with workflowr