Last updated: 2018-09-20
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Environment: empty
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
✔ Seed:
set.seed(1)
The command set.seed(1)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
✔ Session information: recorded
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
✔ Repository version: 2c5aa17
wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.DS_Store
Ignored: analysis/.Rhistory
Ignored: analysis/include/.DS_Store
Ignored: code/.DS_Store
Ignored: data/.DS_Store
Ignored: docs/.DS_Store
Ignored: output/.DS_Store
Untracked files:
Untracked: analysis/Classify.Rmd
Untracked: analysis/EstimateCorEM2.Rmd
Untracked: analysis/EstimateCorEM3.Rmd
Untracked: analysis/EstimateCorMaxEMGD.Rmd
Untracked: analysis/EstimateCorMaxGD.Rmd
Untracked: analysis/EstimateCorOptimEM.Rmd
Untracked: analysis/EstimateCorPrior.Rmd
Untracked: analysis/EstimateCorSol.Rmd
Untracked: analysis/HierarchicalFlashSim.Rmd
Untracked: analysis/MashLowSignalGTEx4.Rmd
Untracked: analysis/Mash_GTEx.Rmd
Untracked: analysis/MeanAsh.Rmd
Untracked: analysis/OutlierDetection.Rmd
Untracked: analysis/OutlierDetection2.Rmd
Untracked: analysis/OutlierDetection3.Rmd
Untracked: analysis/OutlierDetection4.Rmd
Untracked: analysis/Test.Rmd
Untracked: analysis/mash_missing_row.Rmd
Untracked: code/GTExNullModel.R
Untracked: code/MashClassify.R
Untracked: code/MashCorResult.R
Untracked: code/MashNULLCorResult.R
Untracked: code/MashSource.R
Untracked: code/Weight_plot.R
Untracked: code/addemV.R
Untracked: code/estimate_cor.R
Untracked: code/generateDataV.R
Untracked: code/johnprocess.R
Untracked: code/sim_mean_sig.R
Untracked: code/summary.R
Untracked: data/Blischak_et_al_2015/
Untracked: data/scale_data.rds
Untracked: docs/figure/Classify.Rmd/
Untracked: docs/figure/OutlierDetection.Rmd/
Untracked: docs/figure/OutlierDetection2.Rmd/
Untracked: docs/figure/OutlierDetection3.Rmd/
Untracked: docs/figure/Test.Rmd/
Untracked: docs/figure/mash_missing_whole_row_5.Rmd/
Untracked: docs/include/
Untracked: output/AddEMV/
Untracked: output/CovED_UKBio_strong.rds
Untracked: output/CovED_UKBio_strong_Z.rds
Untracked: output/Flash_UKBio_strong.rds
Untracked: output/GTExNULLres/
Untracked: output/GTEx_2.5_nullData.rds
Untracked: output/GTEx_2.5_nullModel.rds
Untracked: output/GTEx_2.5_nullPermData.rds
Untracked: output/GTEx_2.5_nullPermModel.rds
Untracked: output/GTEx_3.5_nullData.rds
Untracked: output/GTEx_3.5_nullModel.rds
Untracked: output/GTEx_3.5_nullPermData.rds
Untracked: output/GTEx_3.5_nullPermModel.rds
Untracked: output/GTEx_3_nullData.rds
Untracked: output/GTEx_3_nullModel.rds
Untracked: output/GTEx_3_nullPermData.rds
Untracked: output/GTEx_3_nullPermModel.rds
Untracked: output/GTEx_4.5_nullData.rds
Untracked: output/GTEx_4.5_nullModel.rds
Untracked: output/GTEx_4.5_nullPermData.rds
Untracked: output/GTEx_4.5_nullPermModel.rds
Untracked: output/GTEx_4_nullData.rds
Untracked: output/GTEx_4_nullModel.rds
Untracked: output/GTEx_4_nullPermData.rds
Untracked: output/GTEx_4_nullPermModel.rds
Untracked: output/MASH.10.em2.result.rds
Untracked: output/MASH.10.mle.result.rds
Untracked: output/MASHNULL.V.result.1.rds
Untracked: output/MASHNULL.V.result.10.rds
Untracked: output/MASHNULL.V.result.11.rds
Untracked: output/MASHNULL.V.result.12.rds
Untracked: output/MASHNULL.V.result.13.rds
Untracked: output/MASHNULL.V.result.14.rds
Untracked: output/MASHNULL.V.result.15.rds
Untracked: output/MASHNULL.V.result.16.rds
Untracked: output/MASHNULL.V.result.17.rds
Untracked: output/MASHNULL.V.result.18.rds
Untracked: output/MASHNULL.V.result.19.rds
Untracked: output/MASHNULL.V.result.2.rds
Untracked: output/MASHNULL.V.result.20.rds
Untracked: output/MASHNULL.V.result.3.rds
Untracked: output/MASHNULL.V.result.4.rds
Untracked: output/MASHNULL.V.result.5.rds
Untracked: output/MASHNULL.V.result.6.rds
Untracked: output/MASHNULL.V.result.7.rds
Untracked: output/MASHNULL.V.result.8.rds
Untracked: output/MASHNULL.V.result.9.rds
Untracked: output/MashCorSim--midway/
Untracked: output/Mash_EE_Cov_0_plusR1.rds
Untracked: output/UKBio_mash_model.rds
Unstaged changes:
Deleted: analysis/EstimateCorMax.Rmd
Modified: analysis/EstimateCorMaxEM2.Rmd
Deleted: analysis/MashLowSignalGTEx3.5P.Rmd
Modified: analysis/Mash_UKBio.Rmd
Modified: analysis/mash_missing_samplesize.Rmd
Modified: output/Flash_T2_0.rds
Modified: output/Flash_T2_0_mclust.rds
Modified: output/Mash_model_0_plusR1.rds
Modified: output/PresiAddVarCol.rds
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 2c5aa17 | zouyuxin | 2018-09-20 | wflow_publish(“analysis/EstimateCorEM.Rmd”) |
Last updated: 2018-09-20
library(mashr)
Loading required package: ashr
source('../code/generateDataV.R')
source('../code/summary.R')
\[ P(X,\mathbf{z}|\rho, \pi) = \prod_{i=1}^{n} \prod_{k=0}^{K}\left[\pi_{k}N(x_{i}; 0, \Sigma_{k})\right]^{\mathbb{I}(z_{i}=k)} \prod_{k=0}^{K}\pi_{k}^{\lambda_{k}-1} \]
\[ \mathbb{E}_{\mathbf{z}|X} \log P(X,\mathbf{z}|\rho, \pi) = \sum_{i=1}^{n} \sum_{k=0}^{K} P(z_{i}=k|X)\left[ \log \pi_{k} + \log N(x_{i}; 0, \Sigma_{k})\right] + \sum_{k=0}^{K} (\lambda_{k}-1)\log \pi_{k} \]
\[ \gamma_{z_{i}}(k) = P(z_{i}=k|X_{i}) = \frac{\pi_{k}N(x_{i}; 0, \Sigma_{k})}{\sum_{k'=0}^{K}\pi_{k'}N(x_{i}; 0, \Sigma_{k'})} \]
\(\pi\): \[ \sum_{i=1}^{n} \gamma_{z_{i}}(k) \frac{1}{\pi_{k}} + \frac{\lambda_{k}-1}{\pi_{k}} - \lambda = 0 \quad \rightarrow \pi_{k} = \frac{1}{\lambda} \left(\sum_{i=1}^{n} \gamma_{z_{i}}(k) + \lambda_{k}-1\right) \quad \lambda = n + \sum_{k=1}^{K}\lambda_{k} - K \]
\[ \hat{\pi}_{k} = \frac{\sum_{i=1}^{n} \gamma_{z_{i}}(k) + \lambda_{k} - 1 }{n + \sum_{k=1}^{K}\lambda_{k} - K } \]
\(\rho\): \[ \begin{align*} f(\rho) &= \sum_{i=1}^{n} \sum_{k=1}^{K} \gamma_{z_{i}}(k)\left[ -\frac{1}{2}\log (1-\phi_{k}^2)-\frac{1}{2(1-\phi_{k}^2)}\left[ \frac{x_{i}^2}{\sigma_{k11}^2} + \frac{y_{i}^2}{\sigma_{k22}^2} - \frac{2\phi_{k}x_{i}y_{i}}{\sigma_{k11}\sigma_{k22}}\right] \right]\\ f(\rho)' &= \sum_{i=1}^{n} \sum_{k=1}^{K} \gamma_{z_{i}}(k)\left[ \frac{\phi_{k}}{1-\phi_{k}^2}-\frac{\phi_{k}}{(1-\phi_{k}^2)^2}\left[ \frac{x_{i}^2}{\sigma_{k11}^2} + \frac{y_{i}^2}{\sigma_{k22}^2}\right] - \frac{\phi_{k}+1}{(1-\phi_{k}^2)^2}\frac{x_{i}y_{i}}{\sigma_{k11}\sigma_{k22}}\right]\frac{1}{\sigma_{k11}\sigma_{k22}} = 0 \end{align*} \] \(\phi_k = \frac{\rho + u_{k12}}{\sigma_{k11}\sigma_{k22}}\), \(\phi_{k}\) is a function of \(\rho\).
Algorithm:
Input: X, Ulist, init_rho, init_pi
Compute loglikelihood
delta = 1
while delta > tol
E step: update z
M step: update pi, update rho
Compute loglikelihood
Update delta
get_sigma <- function(rho, Ulist){
V <- matrix(c(1,rho,rho,1), 2,2)
lapply(Ulist, function(U) U + V)
}
penalty <- function(prior, pi_s){
subset <- (prior != 1.0)
sum((prior-1)[subset]*log(pi_s[subset]))
}
compute.log.lik <- function(lL, p, prior){
p = normalize(pmax(0,p))
temp = log(exp(lL$loglik_matrix) %*% p)+lL$lfactors
return(sum(temp) + penalty(prior, p))
# return(sum(temp))
}
normalize <- function(x){
x/sum(x)
}
mixture.EM.times <- function(X, Ulist, init_rho=0, init_pi=NULL, prior = c('nullbiased', 'uniform'), control = list()){
times = length(init_rho)
result = list()
loglik = c()
rho = c()
time.t = c()
converge.status = c()
for(i in 1:times){
out.time = system.time(result[[i]] <- mixture.EM(X, Ulist,
init_pi=init_pi,
init_rho=init_rho[i],
prior=prior,
control = control))
time.t = c(time.t, out.time['elapsed'])
loglik = c(loglik, -result[[i]]$B)
rho = c(rho, result[[i]]$rhohat)
converge.status = c(converge.status, result[[i]]$converged)
}
if(abs(max(loglik) - min(loglik)) < 1e-4){
status = 'global'
}else{
status = 'local'
}
ind = which.max(loglik)
return(list(result = result[[ind]], status = status, loglik = loglik, rho=rho, time = time.t, converge.status = converge.status))
}
mixture.EM <- function(X, Ulist, init_rho=0, init_pi = NULL, prior = c('nullbiased', 'uniform'), control = list()) {
prior = match.arg(prior)
prior <- mashr:::set_prior(length(Ulist), prior)
k = length(Ulist)
if (is.null(init_pi)){
init_pi <- rep(1/k,k)
}
control = ashr:::set_control_squarem(control,nrow(X))
res = SQUAREM::squarem(par=c(init_pi, init_rho),fixptfn=fixpoint, objfn=negpenloglik,X=X, Ulist=Ulist, prior=prior, control=control)
return(list(pihat = normalize(pmax(0,head(res$par, -1))), rho = tail(res$par, 1), loglik=-res$value.objfn, niter = res$iter, converged=res$convergence, control=control))
}
fixpoint = function(para, X, Ulist, prior){
rho = tail(para,1)
pi_s = head(para, -1)
pi_s = normalize(pmax(0,pi_s)) #avoid occasional problems with negative pis due to rounding
# compute L
Sigma <- get_sigma(rho, Ulist)
L <- t(plyr::laply(Sigma,function(U){mvtnorm::dmvnorm(x=X,sigma=U)}))
# E
m = t(pi_s * t(L)) # matrix_lik is n by k; so this is also n by k
m.rowsum = rowSums(m)
classprob = m/m.rowsum #an n by k matrix
# M
pinew = normalize(colSums(classprob) + prior - 1)
rhonew = optimize(EMloglikelihood, interval = c(-1,1), maximum = TRUE, X = X, Ulist = Ulist, z = classprob)$maximum
return(c(pinew,rhonew))
}
EMloglikelihood = function(rho, X, Ulist, z){
Sigma = get_sigma(rho, Ulist)
L = t(plyr::laply(Sigma,function(U){mvtnorm::dmvnorm(x=X,sigma=U, log=TRUE)}))
sum(L * z)
}
negpenloglik = function(para, X, Ulist, prior){
Sigma <- get_sigma(tail(para,1), Ulist)
lL <- t(plyr::laply(Sigma,function(U){mvtnorm::dmvnorm(x=X,sigma=U, log=TRUE)}))
lfactors <- apply(lL,1,max)
matrix_llik <- lL - lfactors
lL = list(loglik_matrix = matrix_llik,
lfactors = lfactors)
ll <- compute.log.lik(lL, head(para, -1), prior)
return(-ll)
}
\[ \hat{\beta}|\beta \sim N_{2}(\hat{\beta}; \beta, \left(\begin{matrix} 1 & 0.5 \\ 0.5 & 1 \end{matrix}\right)) \]
\[ \beta \sim \frac{1}{4}\delta_{0} + \frac{1}{4}N_{2}(0, \left(\begin{matrix} 1 & 0 \\ 0 & 0 \end{matrix}\right)) + \frac{1}{4}N_{2}(0, \left(\begin{matrix} 0 & 0 \\ 0 & 1 \end{matrix}\right)) + \frac{1}{4}N_{2}(0, \left(\begin{matrix} 1 & 1 \\ 1 & 1 \end{matrix}\right)) \]
n = 4000
set.seed(1)
n = 4000; p = 2
Sigma = matrix(c(1,0.5,0.5,1),p,p)
U0 = matrix(0,2,2)
U1 = U0; U1[1,1] = 1
U2 = U0; U2[2,2] = 1
U3 = matrix(1,2,2)
Utrue = list(U0=U0, U1=U1, U2=U2, U3=U3)
data = generate_data(n, p, Sigma, Utrue)
m.data = mash_set_data(data$Bhat, data$Shat)
U.c = cov_canonical(m.data)
grid = mashr:::autoselect_grid(m.data, sqrt(2))
Ulist = mashr:::normalize_Ulist(U.c)
xUlist = mashr:::expand_cov(Ulist,grid,usepointmass = TRUE)
result.em <- mixture.EM(m.data$Bhat, xUlist)
The estimated \(\rho\) is 0.5066755.
m.data.em = mash_set_data(data$Bhat, data$Shat, V = matrix(c(1,result.em$rho,result.em$rho,1),2,2))
U.c = cov_canonical(m.data.em)
m.em = mash(m.data.em, U.c, verbose= FALSE)
null.ind = which(apply(data$B,1,sum) == 0)
The log likelihood is -1.23025410^{4}. There are 26 significant samples, 0 false positives. The RRMSE is 0.582108.
Algorithm:
Input: X, Ulist, init_rho
Given rho, estimate pi by max loglikelihood (convex problem)
Compute loglikelihood
delta = 1
while delta > tol
E step: compute z
M step: update rho
Given rho, estimate pi by max loglikelihood (convex problem)
Compute loglikelihood
Update delta
mixture.M.times <- function(X, Ulist, init_rho=0, init_pi=NULL, prior = c('nullbiased', 'uniform'), control = list()){
times = length(init_rho)
result = list()
loglik = c()
rho = c()
time.t = c()
converge.status = c()
for(i in 1:times){
out.time = system.time(result[[i]] <- mixture.M(X, Ulist,
init_pi=init_pi,
init_rho=init_rho[i],
prior=prior,
control = control))
time.t = c(time.t, out.time['elapsed'])
loglik = c(loglik, -result[[i]]$B)
rho = c(rho, result[[i]]$rhohat)
converge.status = c(converge.status, result[[i]]$converged)
}
if(abs(max(loglik) - min(loglik)) < 1e-4){
status = 'global'
}else{
status = 'local'
}
ind = which.max(loglik)
return(list(result = result[[ind]], status = status, loglik = loglik, rho=rho, time = time.t, converge.status = converge.status))
}
mixture.M <- function(X, Ulist, init_rho=0, init_pi = NULL, tol=1e-5, prior = c('nullbiased', 'uniform')) {
prior <- match.arg(prior)
m.model = fit_mash(X, Ulist, rho = init_rho, prior=prior)
pi_s = get_estimated_pi(m.model, dimension = 'all')
# get complete Ulist
xUlist = mashr:::expand_cov(Ulist, m.model$fitted_g$grid, m.model$fitted_g$usepointmass)
prior.v <- mashr:::set_prior(length(pi_s), prior)
# compute loglikelihood
log_liks <- c()
log_liks <- c(log_liks, get_loglik(m.model)+penalty(prior.v, pi_s))
delta.ll <- 1
niter <- 0
rho = init_rho
while(delta.ll > tol){
# compute L
Sigma <- get_sigma(rho, xUlist)
L <- t(plyr::laply(Sigma,function(U){mvtnorm::dmvnorm(x=X,sigma=U)}))
m = t(pi_s * t(L)) # matrix_lik is n by k; so this is also n by k
m.rowsum = rowSums(m)
classprob = m/m.rowsum #an n by k matrix
# max_rho
rho <- optimize(EMloglikelihood, interval = c(-1,1), maximum = TRUE, X = X, Ulist = xUlist, z = classprob)$maximum
m.model = fit_mash(X, Ulist, rho, prior=prior)
pi_s = get_estimated_pi(m.model, dimension = 'all')
log_liks <- c(log_liks, get_loglik(m.model)+penalty(prior.v, pi_s))
# Update delta
delta.ll <- log_liks[length(log_liks)] - log_liks[length(log_liks)-1]
niter <- niter + 1
}
return(list(pihat = normalize(pi_s), rho = rho, loglik=log_liks))
}
EMloglikelihood = function(rho, X, Ulist, z){
Sigma = get_sigma(rho, Ulist)
L = t(plyr::laply(Sigma,function(U){mvtnorm::dmvnorm(x=X,sigma=U, log=TRUE)}))
sum(L * z)
}
fit_mash <- function(X, Ulist, rho, prior=c('nullbiased', 'uniform')){
m.data = mashr::mash_set_data(Bhat=X, Shat=1, V = matrix(c(1, rho, rho, 1), 2, 2))
m.model = mashr::mash(m.data, Ulist, prior=prior, verbose = FALSE)
return(m.model)
}
set.seed(1)
n = 4000; p = 2
Sigma = matrix(c(1,0.5,0.5,1),p,p)
U0 = matrix(0,2,2)
U1 = U0; U1[1,1] = 1
U2 = U0; U2[2,2] = 1
U3 = matrix(1,2,2)
Utrue = list(U0=U0, U1=U1, U2=U2, U3=U3)
data = generate_data(n, p, Sigma, Utrue)
m.data = mash_set_data(data$Bhat, data$Shat)
U.c = cov_canonical(m.data)
# grid = mashr:::autoselect_grid(m.data, sqrt(2))
# Ulist = mashr:::normalize_Ulist(U.c)
# xUlist = mashr:::expand_cov(Ulist,grid,usepointmass=TRUE)
result.m <- mixture.M(m.data$Bhat, Ulist)
The estimated \(\rho\) is 0.5062801.
m.data.m = mash_set_data(data$Bhat, data$Shat, V = matrix(c(1,result.m$rho,result.m$rho,1),2,2))
U.c.m = cov_canonical(m.data.m)
m.m = mash(m.data.m, U.c, verbose= FALSE)
null.ind = which(apply(data$B,1,sum) == 0)
The log likelihood is -1.23025410^{4}. There are 26 significant samples, 0 false positives. The RRMSE is 0.5820861.
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] mashr_0.2-12 ashr_2.2-13
loaded via a namespace (and not attached):
[1] Rcpp_0.12.18 knitr_1.20 whisker_0.3-2
[4] magrittr_1.5 workflowr_1.1.1 REBayes_1.3
[7] MASS_7.3-50 pscl_1.5.2 doParallel_1.0.11
[10] SQUAREM_2017.10-1 lattice_0.20-35 foreach_1.4.4
[13] plyr_1.8.4 stringr_1.3.1 tools_3.5.1
[16] parallel_3.5.1 grid_3.5.1 R.oo_1.22.0
[19] rmeta_3.0 git2r_0.23.0 htmltools_0.3.6
[22] iterators_1.0.10 assertthat_0.2.0 yaml_2.2.0
[25] rprojroot_1.3-2 digest_0.6.15 Matrix_1.2-14
[28] codetools_0.2-15 R.utils_2.6.0 evaluate_0.11
[31] rmarkdown_1.10 stringi_1.2.4 compiler_3.5.1
[34] Rmosek_8.0.69 backports_1.1.2 R.methodsS3_1.7.1
[37] mvtnorm_1.0-8 truncnorm_1.0-8
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] mashr_0.2-12 ashr_2.2-13
loaded via a namespace (and not attached):
[1] Rcpp_0.12.18 knitr_1.20 whisker_0.3-2
[4] magrittr_1.5 workflowr_1.1.1 REBayes_1.3
[7] MASS_7.3-50 pscl_1.5.2 doParallel_1.0.11
[10] SQUAREM_2017.10-1 lattice_0.20-35 foreach_1.4.4
[13] plyr_1.8.4 stringr_1.3.1 tools_3.5.1
[16] parallel_3.5.1 grid_3.5.1 R.oo_1.22.0
[19] rmeta_3.0 git2r_0.23.0 htmltools_0.3.6
[22] iterators_1.0.10 assertthat_0.2.0 yaml_2.2.0
[25] rprojroot_1.3-2 digest_0.6.15 Matrix_1.2-14
[28] codetools_0.2-15 R.utils_2.6.0 evaluate_0.11
[31] rmarkdown_1.10 stringi_1.2.4 compiler_3.5.1
[34] Rmosek_8.0.69 backports_1.1.2 R.methodsS3_1.7.1
[37] mvtnorm_1.0-8 truncnorm_1.0-8
This reproducible R Markdown analysis was created with workflowr 1.1.1