Last updated: 2018-05-15
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Environment: empty
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
✔ Seed:
set.seed(12345)
The command set.seed(12345)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
✔ Session information: recorded
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
✔ Repository version: 388e65e
wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.DS_Store
Ignored: analysis/BH_robustness_cache/
Ignored: analysis/FDR_Null_cache/
Ignored: analysis/FDR_null_betahat_cache/
Ignored: analysis/Rmosek_cache/
Ignored: analysis/StepDown_cache/
Ignored: analysis/alternative2_cache/
Ignored: analysis/alternative_cache/
Ignored: analysis/ash_gd_cache/
Ignored: analysis/average_cor_gtex_2_cache/
Ignored: analysis/average_cor_gtex_cache/
Ignored: analysis/brca_cache/
Ignored: analysis/cash_deconv_cache/
Ignored: analysis/cash_fdr_1_cache/
Ignored: analysis/cash_fdr_2_cache/
Ignored: analysis/cash_fdr_3_cache/
Ignored: analysis/cash_fdr_4_cache/
Ignored: analysis/cash_fdr_5_cache/
Ignored: analysis/cash_fdr_6_cache/
Ignored: analysis/cash_plots_cache/
Ignored: analysis/cash_sim_1_cache/
Ignored: analysis/cash_sim_2_cache/
Ignored: analysis/cash_sim_3_cache/
Ignored: analysis/cash_sim_4_cache/
Ignored: analysis/cash_sim_5_cache/
Ignored: analysis/cash_sim_6_cache/
Ignored: analysis/cash_sim_7_cache/
Ignored: analysis/correlated_z_2_cache/
Ignored: analysis/correlated_z_3_cache/
Ignored: analysis/correlated_z_cache/
Ignored: analysis/create_null_cache/
Ignored: analysis/cutoff_null_cache/
Ignored: analysis/design_matrix_2_cache/
Ignored: analysis/design_matrix_cache/
Ignored: analysis/diagnostic_ash_cache/
Ignored: analysis/diagnostic_correlated_z_2_cache/
Ignored: analysis/diagnostic_correlated_z_3_cache/
Ignored: analysis/diagnostic_correlated_z_cache/
Ignored: analysis/diagnostic_plot_2_cache/
Ignored: analysis/diagnostic_plot_cache/
Ignored: analysis/efron_leukemia_cache/
Ignored: analysis/fitting_normal_cache/
Ignored: analysis/gaussian_derivatives_2_cache/
Ignored: analysis/gaussian_derivatives_3_cache/
Ignored: analysis/gaussian_derivatives_4_cache/
Ignored: analysis/gaussian_derivatives_5_cache/
Ignored: analysis/gaussian_derivatives_cache/
Ignored: analysis/gd-ash_cache/
Ignored: analysis/gd_delta_cache/
Ignored: analysis/gd_lik_2_cache/
Ignored: analysis/gd_lik_cache/
Ignored: analysis/gd_w_cache/
Ignored: analysis/knockoff_10_cache/
Ignored: analysis/knockoff_2_cache/
Ignored: analysis/knockoff_3_cache/
Ignored: analysis/knockoff_4_cache/
Ignored: analysis/knockoff_5_cache/
Ignored: analysis/knockoff_6_cache/
Ignored: analysis/knockoff_7_cache/
Ignored: analysis/knockoff_8_cache/
Ignored: analysis/knockoff_9_cache/
Ignored: analysis/knockoff_cache/
Ignored: analysis/knockoff_var_cache/
Ignored: analysis/marginal_z_alternative_cache/
Ignored: analysis/marginal_z_cache/
Ignored: analysis/mosek_reg_2_cache/
Ignored: analysis/mosek_reg_4_cache/
Ignored: analysis/mosek_reg_5_cache/
Ignored: analysis/mosek_reg_6_cache/
Ignored: analysis/mosek_reg_cache/
Ignored: analysis/pihat0_null_cache/
Ignored: analysis/plot_diagnostic_cache/
Ignored: analysis/poster_obayes17_cache/
Ignored: analysis/real_data_simulation_2_cache/
Ignored: analysis/real_data_simulation_3_cache/
Ignored: analysis/real_data_simulation_4_cache/
Ignored: analysis/real_data_simulation_5_cache/
Ignored: analysis/real_data_simulation_cache/
Ignored: analysis/rmosek_primal_dual_2_cache/
Ignored: analysis/rmosek_primal_dual_cache/
Ignored: analysis/seqgendiff_cache/
Ignored: analysis/simulated_correlated_null_2_cache/
Ignored: analysis/simulated_correlated_null_3_cache/
Ignored: analysis/simulated_correlated_null_cache/
Ignored: analysis/simulation_real_se_2_cache/
Ignored: analysis/simulation_real_se_cache/
Ignored: analysis/smemo_2_cache/
Ignored: data/LSI/
Ignored: docs/.DS_Store
Ignored: docs/figure/.DS_Store
Ignored: output/fig/
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
File | Version | Author | Date | Message |
---|---|---|---|---|
html | e05bc83 | LSun | 2018-05-12 | Update to 1.0 |
rmd | cc0ab83 | Lei Sun | 2018-05-11 | update |
html | 807f924 | LSun | 2018-05-11 | Build site. |
rmd | 548070e | LSun | 2018-05-11 | wflow_publish(“analysis/gd_w.rmd”) |
html | 84c7d5b | LSun | 2018-05-11 | Build site. |
rmd | 5c9aeac | LSun | 2018-05-11 | wflow_publish(“analysis/gd_w.rmd”) |
html | ed9c0d9 | LSun | 2018-05-11 | Build site. |
rmd | 0ea9f77 | Lei Sun | 2018-05-10 | W |
html | adeab80 | LSun | 2018-05-06 | Build site. |
rmd | 0b0a394 | LSun | 2018-05-06 | wflow_publish(c(“analysis/BH_robustness.rmd”, “analysis/gd_w.rmd”)) |
html | 720d179 | LSun | 2018-04-18 | Build site. |
rmd | b82e2bc | LSun | 2018-04-18 | wflow_publish(c(“analysis/gd_w.rmd”, “analysis/index.Rmd”)) |
rmd | c7c5984 | Lei Sun | 2018-04-15 | ecdfz |
html | 1c2f32e | LSun | 2018-04-15 | Build site. |
rmd | 3b5dcc4 | LSun | 2018-04-15 | wflow_publish(“analysis/gd_w.rmd”) |
rmd | 1b5d40f | Lei Sun | 2018-04-13 | gd w |
rmd | c766b80 | LSun | 2018-04-13 | add lfsr |
source("../code/gdash_lik.R")
source("../code/gdfit.R")
source("../code/count_to_summary.R")
library(limma)
library(edgeR)
library(ashr)
library(plyr)
library(ggplot2)
library(reshape2)
set.seed(777)
d <- 10
n <- 1e4
B <- matrix(rnorm(n * d), n, d)
Sigma <- B %*% t(B) + diag(n)
sigma <- diag(Sigma)
Rho <- cov2cor(Sigma)
rhobar <- c()
for (l in 1 : 10) {
rhobar[l] <- (sum(Rho^l) - n) / (n * (n - 1))
}
par(mar = c(5.1, 4.1, 1, 2.1))
hist(Rho[lower.tri(Rho)], xlab = expression(rho[ij]), main = "")
Version | Author | Date |
---|---|---|
720d179 | LSun | 2018-04-18 |
set.seed(20)
z <- rnorm(d)
Z <- B %*% z + rnorm(n)
Z <- Z / sqrt(sigma)
cat("sd(Z) =", sd(Z))
sd(Z) = 1.262205
hist(Z, breaks = 20, prob = TRUE, ylim = c(0, dnorm(0)))
lines(seq(-5, 5, by = 0.1), dnorm(seq(-5, 5, by = 0.1)), col = "blue")
Version | Author | Date |
---|---|---|
720d179 | LSun | 2018-04-18 |
p <- pnorm(-abs(Z)) * 2
par(mfcol = c(2, 2))
par(mar = c(5.1, 4.1, 3, 2.1))
hist(p, breaks = 100, main = "Correlated", xlab = "p-value")
par(mar = c(5.1, 4.1, 1, 2.1))
plot(-log(p), ylim = range(-log(p), -log(pnorm(-sqrt(2 * log(n))) * 2), -log(0.05 / n)))
abline(h = -log(pnorm(-sqrt(2 * log(n))) * 2), col = "maroon")
abline(h = -log(0.05 / n), col = "red")
abline(h = -log(0.001), col = "green")
abline(h = -log(0.05), col = "blue")
Z <- rnorm(n)
p <- pnorm(-abs(Z)) * 2
par(mar = c(5.1, 4.1, 3, 2.1))
hist(p, breaks = 100, main = "Independent", xlab = "p-value")
par(mar = c(5.1, 4.1, 1, 2.1))
plot(-log(p), ylim = range(-log(p), -log(pnorm(-sqrt(2 * log(n))) * 2), -log(0.05 / n)))
abline(h = -log(pnorm(-sqrt(2 * log(n))) * 2), col = "maroon")
abline(h = -log(0.05 / n), col = "red")
abline(h = -log(0.001), col = "green")
abline(h = -log(0.05), col = "blue")
Version | Author | Date |
---|---|---|
720d179 | LSun | 2018-04-18 |
set.seed(777)
nsim <- 1e4
Z.list <- W <- list()
for (i in 1 : nsim) {
z <- rnorm(d)
Z <- B %*% z + rnorm(n)
Z <- Z / sqrt(sigma)
Z.list[[i]] <- Z
Z.GD <- gdfit.mom(Z, 100)
W[[i]] <- Z.GD$w
}
Z.sim <- Z.list
W.sim <- W
r <- readRDS("../data/liver.rds")
top_genes_index = function (g, X) {
return(order(rowSums(X), decreasing = TRUE)[1 : g])
}
lcpm = function (r) {
R = colSums(r)
t(log2(((t(r) + 0.5) / (R + 1)) * 10^6))
}
nsamp <- 5
ngene <- 1e4
Y = lcpm(r)
subset = top_genes_index(ngene, Y)
r = r[subset,]
set.seed(7)
nsim <- 1e4
Z.list <- W <- list()
for (i in 1 : nsim) {
## generate data
counts <- r[, sample(ncol(r), 2 * nsamp)]
design <- model.matrix(~c(rep(0, nsamp), rep(1, nsamp)))
summary <- count_to_summary(counts, design)
Z <- summary$z
Z.list[[i]] <- Z
Z.GD <- gdfit.mom(Z, 100)
W[[i]] <- Z.GD$w
}
Z.gtex <- Z.list
W.gtex <- W
quantile.vec1 <- exp(seq(-21, -5, by = 0.01))
quantile.vec2 <- seq(0.007, 0.993, by = 0.001)
quantile.vec3 <- exp(seq(-5, -21, by = -0.01))
emp.cdf.Z1 <- sapply(quantile.vec1, function(x) {sapply(Z.gtex, function(y) mean(y <= qnorm(x)))})
emp.cdf.Z2 <- sapply(quantile.vec2, function(x) {sapply(Z.gtex, function(y) mean(y <= qnorm(x)))})
emp.cdf.Z3 <- sapply(quantile.vec3, function(x) {sapply(Z.gtex, function(y) mean(y <= -qnorm(x)))})
emp.cdf.Z4 <- sapply(quantile.vec3, function(x) {sapply(Z.gtex, function(y) mean(y > -qnorm(x)))})
ecdf.avg1 <- colMeans(emp.cdf.Z1)
ecdf.avg2 <- colMeans(emp.cdf.Z2)
ecdf.avg3 <- colMeans(emp.cdf.Z3)
ecdf.avg4 <- colMeans(emp.cdf.Z4)
ecdf.avg <- c(ecdf.avg1, ecdf.avg2, ecdf.avg3)
ecdf.tail.avg.conf.int1 <- apply(emp.cdf.Z1, 2, function(x) {t.test(x)$conf.int})
ecdf.tail.avg.conf.int4 <- apply(emp.cdf.Z4, 2, function(x) {t.test(x)$conf.int})
plot(c(qnorm(quantile.vec1), qnorm(quantile.vec2), -qnorm(quantile.vec3)), ecdf.avg, type = "l", col = "red", xlab = "z", ylab = "Cumulative Distribution Function (CDF)")
lines(c(qnorm(quantile.vec1), qnorm(quantile.vec2), -qnorm(quantile.vec3)), c(quantile.vec1, quantile.vec2, pnorm(-qnorm(quantile.vec3))), lty = 2)
legend("bottomright", lty = c(1, 2), col = c(1, 2), legend = c(expression(bar("F")(z)), expression(Phi(z))))
Version | Author | Date |
---|---|---|
720d179 | LSun | 2018-04-18 |
plot(qnorm(quantile.vec1), log(ecdf.avg1), type = "l",
ylim = range(log(quantile.vec1), log(ecdf.avg1)),
xlab = "z", ylab = "log (CDF)")
lines(qnorm(quantile.vec1), log(quantile.vec1), lty = 2, col = "red")
lines(qnorm(quantile.vec1), log(pnorm(qnorm(quantile.vec1), 0, 1.1)), lty = 2, col = "green")
lines(qnorm(quantile.vec1), log(pnorm(qnorm(quantile.vec1), 0, 1.05)), lty = 2, col = "blue")
polygon(x = c(qnorm(quantile.vec1), rev(qnorm(quantile.vec1))),
y = c(log(ecdf.tail.avg.conf.int1[1, ]), rev(log(ecdf.tail.avg.conf.int1[2, ]))),
border = NA,
col = grDevices::adjustcolor("grey75", alpha.f = 0.5))
Warning in log(ecdf.tail.avg.conf.int1[1, ]): NaNs produced
legend("bottomright", lty = c(1, 2, 2, 2), col = c(1, 2, 4, 3), legend = c(
expression(log(bar("F")(z))),
expression(log(Phi(z))),
expression(log(Phi(z / 1.05))),
expression(log(Phi(z / 1.1)))
))
Version | Author | Date |
---|---|---|
720d179 | LSun | 2018-04-18 |
plot(-qnorm(quantile.vec3), log(ecdf.avg4), type = "l",
ylim = range(log(quantile.vec3), log(ecdf.avg4)),
xlab = "z", ylab = "log (1 - CDF)")
lines(-qnorm(quantile.vec3), log(quantile.vec3), lty = 2, col = "red")
lines(-qnorm(quantile.vec3), log(pnorm(qnorm(quantile.vec3), 0, 1.1)), lty = 2, col = "green")
lines(-qnorm(quantile.vec3), log(pnorm(qnorm(quantile.vec3), 0, 1.05)), lty = 2, col = "blue")
polygon(x = c(-qnorm(quantile.vec3), rev(-qnorm(quantile.vec3))),
y = c(log(ecdf.tail.avg.conf.int4[1, ]), rev(log(ecdf.tail.avg.conf.int4[2, ]))),
border = NA,
col = grDevices::adjustcolor("grey75", alpha.f = 0.5))
legend("bottomleft", lty = c(1, 2, 2, 2), col = c(1, 2, 4, 3), legend = c(
expression(log(1 - bar("F")(z))),
expression(log(1 - Phi(z))),
expression(log(1 - Phi(z / 1.05))),
expression(log(1 - Phi(z / 1.1)))
))
Version | Author | Date |
---|---|---|
720d179 | LSun | 2018-04-18 |
set.seed(777)
nsamp <- 50
nsim <- 1e3
z <- sebetahat <- list()
for (i in 1 : nsim) {
## generate data
counts <- r[, sample(ncol(r), 2 * nsamp)]
design <- model.matrix(~c(rep(0, nsamp), rep(1, nsamp)))
summary <- count_to_summary(counts, design)
z[[i]] <- summary$z
sebetahat[[i]] <- summary$sebetahat
}
sd.vec <- sapply(z, sd)
median.vec <- sapply(z, median)
fd.vec <- sapply(z, function(x) {
p <- pnorm(-abs(x)) * 2
sum(p <= 0.005)
})
sel <- c(834, 211, 397, 748)
par(mfrow = c(2, 2))
for (i in seq(sel)) {
fit <- gdfit(z[[sel[i]]], 10)
plot.gdfit(z[[sel[i]]], fit$w, fit$L, legend = FALSE)
}
Version | Author | Date |
---|---|---|
adeab80 | LSun | 2018-05-06 |
set.seed(6)
par(mfrow = c(2, 3))
par(mar = c(4.5, 4.5, 2, 2))
hist(pnorm(-abs(z[[834]])) * 2, prob = TRUE, xlab = "", breaks = 100, main = "(a): Histogram of two-sided p-values")
lines(c(0, 1), c(1, 1), col = "red")
hist(z[[834]], prob = TRUE, breaks = 100, xlab = "", xlim = c(-4.5, -2), main = "(b): Left tail of correlated z-scores")
lines(seq(-6, 6, by = 0.01), dnorm(seq(-6, 6, by = 0.01), 0, sd(z[[834]])), col = "blue")
lines(seq(-6, 6, by = 0.01), dnorm(seq(-6, 6, by = 0.01)), col = "red")
hist(z[[834]], prob = TRUE, breaks = 100, xlab = "", xlim = c(2, 4.5), main = "(c): Right tail of correlated z-scores")
lines(seq(-6, 6, by = 0.01), dnorm(seq(-6, 6, by = 0.01)), col = "red")
p <- pnorm(-abs(z[[834]])) * 2
plot(sample(-log(pnorm(-abs(z[[834]])) * 2)), ylim = c(0, 20), ylab = "-log(p)", main = expression(paste("(d): Correlated ", N(0, 1))))
abline(h = -log(0.005), col = "red")
abline(h = -log(pnorm(-sqrt(2 * log(1e4))) * 2), col = "blue")
abline(h = -log(0.05 / 1e4), col = "green")
plot(-log(pnorm(-abs(rnorm(1e4))) * 2), ylim = c(0, 20), ylab = "-log(p)", main = expression(paste("(e): Independent ", N(0, 1))))
abline(h = -log(0.005), col = "red")
abline(h = -log(pnorm(-sqrt(2 * log(1e4))) * 2), col = "blue")
abline(h = -log(0.05 / 1e4), col = "green")
plot(-log(pnorm(-abs(rnorm(1e4, 0, 1.6))) * 2), ylim = c(0, 20), ylab = "-log(p)", main = expression(paste("(f): Independent ", N(0, 1.6^2))))
abline(h = -log(0.005), col = "red")
abline(h = -log(pnorm(-sqrt(2 * log(1e4))) * 2), col = "blue")
abline(h = -log(0.05 / 1e4), col = "green")
Version | Author | Date |
---|---|---|
adeab80 | LSun | 2018-05-06 |
p.bh <- p.adjust(p, method = "BH")
sum(p.bh <= 0.05)
[1] 575
plot(sort(log(p)), cex = 0.25, pch = 19, ylim = c(-19, 0), xlab = "Order", ylab = "log(p)")
set.seed(6)
z.indep <- rnorm(1e4)
points(sort(log(pnorm(-abs(z.indep)) * 2)), cex = 0.25, pch = 19, col = "blue")
z.indep <- rnorm(1e4, 0, 1.6)
points(sort(log(pnorm(-abs(z.indep)) * 2)), cex = 0.25, pch = 19, col = "green")
plot(sort(log(p)), cex = 0.25, pch = 19, ylim = c(-19, -2.5), xlim = c(1, 850), xlab = "Order", ylab = "log(p)")
set.seed(6)
z.indep <- rnorm(1e4)
points(sort(log(pnorm(-abs(z.indep)) * 2)), cex = 0.25, pch = 19, col = "blue")
z.indep <- rnorm(1e4, 0, 1.6)
points(sort(log(pnorm(-abs(z.indep)) * 2)), cex = 0.25, pch = 19, col = "green")
abline(h = log(0.005), col = "red", lty = 2)
abline(h = log(pnorm(-sqrt(2 * log(1e4))) * 2), col = "red", lty = 2)
abline(h = log(0.05 / 1e4), col = "red", lty = 2)
Version | Author | Date |
---|---|---|
adeab80 | LSun | 2018-05-06 |
W.sim.gd <- sapply(Z.sim, function (z) {fit.z <- gdfit(z, L = 10); return(
list(w = fit.z$w, status = fit.z$status))})
W.gtex.gd <- sapply(Z.gtex, function (z) {fit.z <- gdfit(z, L = 10); return(
list(w = fit.z$w, status = fit.z$status))})
sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] reshape2_1.4.3 ggplot2_2.2.1 plyr_1.8.4
[4] edgeR_3.20.2 limma_3.34.4 ashr_2.2-2
[7] Rmosek_8.0.69 PolynomF_1.0-1 CVXR_0.95
[10] REBayes_1.2 Matrix_1.2-12 SQUAREM_2017.10-1
[13] EQL_1.0-0 ttutils_1.0-1
loaded via a namespace (and not attached):
[1] gmp_0.5-13.1 Rcpp_0.12.16 pillar_1.0.1
[4] compiler_3.4.3 git2r_0.21.0 workflowr_1.0.1
[7] R.methodsS3_1.7.1 R.utils_2.6.0 iterators_1.0.9
[10] tools_3.4.3 digest_0.6.15 bit_1.1-12
[13] tibble_1.4.1 gtable_0.2.0 evaluate_0.10.1
[16] lattice_0.20-35 rlang_0.1.6 foreach_1.4.4
[19] yaml_2.1.18 parallel_3.4.3 Rmpfr_0.6-1
[22] ECOSolveR_0.4 stringr_1.3.0 knitr_1.20
[25] locfit_1.5-9.1 rprojroot_1.3-2 bit64_0.9-7
[28] grid_3.4.3 R6_2.2.2 rmarkdown_1.9
[31] magrittr_1.5 whisker_0.3-2 scales_0.5.0
[34] MASS_7.3-47 backports_1.1.2 codetools_0.2-15
[37] htmltools_0.3.6 scs_1.1-1 colorspace_1.3-2
[40] stringi_1.1.6 lazyeval_0.2.1 munsell_0.4.3
[43] pscl_1.5.2 doParallel_1.0.11 truncnorm_1.0-7
[46] R.oo_1.21.0
This reproducible R Markdown analysis was created with workflowr 1.0.1