Last updated: 2018-07-27
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Environment: empty
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
✔ Seed:
set.seed(20180714)
The command set.seed(20180714)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
✔ Session information: recorded
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
✔ Repository version: 09d1888
wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: docs/.DS_Store
Ignored: docs/figure/.DS_Store
Untracked files:
Untracked: data/greedy19.rds
Untracked: data/warmstart2/
Unstaged changes:
Modified: analysis/index.Rmd
Modified: analysis/init_fn.Rmd
Modified: analysis/init_fn2.Rmd
Modified: analysis/warmstart.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 09d1888 | Jason Willwerscheid | 2018-07-27 | wflow_publish(c(“analysis/warmstart2.Rmd”)) |
I previously concluded that using warmstarts is more advantageous than not when ebnm_fn = ebnm_pn
. Here I repeat the experiment with ebnm_fn = ebnm_ash
.
# devtools::install_github("stephenslab/flashr", ref="trackObj")
devtools::load_all("/Users/willwerscheid/GitHub/flashr")
Loading flashr
# devtools::install_github("stephenslab/ebnm")
devtools::load_all("/Users/willwerscheid/GitHub/ebnm")
Loading ebnm
gtex <- readRDS(gzcon(url("https://github.com/stephenslab/gtexresults/blob/master/data/MatrixEQTLSumStats.Portable.Z.rds?raw=TRUE")))
strong <- t(gtex$strong.z)
# This block was run in advance.
res.no.warmstart <- flash_add_greedy(strong, Kmax=50, init_fn="udv_svd",
ebnm_fn="ebnm_ash", verbose=TRUE)
res.warmstart <- flash_add_greedy(strong, Kmax=50, init_fn="udv_svd",
ebnm_fn="ebnm_ash", warmstart=TRUE,
verbose=TRUE)
saveRDS(res.no.warmstart, "../data/warmstart2/nowarmstart.rds")
saveRDS(res.warmstart, "../data/warmstart2/warmstart.rds")
res.no.warmstart <- readRDS("./data/warmstart2/nowarmstart.rds")
res.warmstart <- readRDS("./data/warmstart2/warmstart.rds")
The total time (in seconds) needed to optimize factors is:
x1 <- unlist(res.no.warmstart$opt_time)
x2 <- unlist(res.warmstart$opt_time)
list(no.warmstart = sum(x1), warmstart = sum(x2))
$no.warmstart
[1] 686.3349
$warmstart
[1] 606.5611
The time required per factor/loading is as follows.
plot(x1, ylim=c(0, max(x1) + 1), pch=19, col="blue",
xlab="Factor/loading index", ylab="Optimization time (s)")
points(x2, pch=17, col="red")
legend("topright", c("No warmstart", "Warmstart"),
pch=c(19, 17), col=c("blue", "red"))
So using a warmstart often provides a speed-up, but not nearly as reliably as with ebnm_pn
. Indeed, for several factor/loading pairs (17, 19, 22, 24, 26), the warmstart considerably slows things down.
As with ebnm_pn
, using warmstarts yields a slightly worse overall objective.
list(no.warmstart = flash_get_objective(strong, res.no.warmstart$f),
warmstart = flash_get_objective(strong, res.warmstart$f))
$no.warmstart
[1] -1252812
$warmstart
[1] -1252855
The per-factor difference in objective is as follows.
o1 <- sapply(res.no.warmstart$obj,
function(obj) {max(unlist(obj))})
o2 <- sapply(res.warmstart$obj,
function(obj) {max(unlist(obj))})
plot(o2 - o1, type='l',
xlab="Factor/loading index",
ylab="Diff. in obj. using warmstart",
main="Difference in overall objective after adding each factor")
The advantages of using warmstarts are not nearly as compelling as they were with ebnm_pn
. The speed-up is not as reliable, and the sacrifice in the final objective attained is greater. Further, I have not seen Rmosek
fail for ebnm_ash
in the same way that optim
fails for ebnm_pn
.
sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ebnm_0.1-13 flashr_0.5-12
loaded via a namespace (and not attached):
[1] Rcpp_0.12.17 pillar_1.2.1 plyr_1.8.4
[4] compiler_3.4.3 git2r_0.21.0 workflowr_1.0.1
[7] R.methodsS3_1.7.1 R.utils_2.6.0 iterators_1.0.9
[10] tools_3.4.3 testthat_2.0.0 digest_0.6.15
[13] tibble_1.4.2 evaluate_0.10.1 memoise_1.1.0
[16] gtable_0.2.0 lattice_0.20-35 rlang_0.2.0
[19] Matrix_1.2-12 foreach_1.4.4 commonmark_1.4
[22] yaml_2.1.17 parallel_3.4.3 withr_2.1.1.9000
[25] stringr_1.3.0 roxygen2_6.0.1.9000 xml2_1.2.0
[28] knitr_1.20 devtools_1.13.4 rprojroot_1.3-2
[31] grid_3.4.3 R6_2.2.2 rmarkdown_1.8
[34] ggplot2_2.2.1 ashr_2.2-10 magrittr_1.5
[37] whisker_0.3-2 backports_1.1.2 scales_0.5.0
[40] codetools_0.2-15 htmltools_0.3.6 MASS_7.3-48
[43] assertthat_0.2.0 softImpute_1.4 colorspace_1.3-2
[46] stringi_1.1.6 lazyeval_0.2.1 munsell_0.4.3
[49] doParallel_1.0.11 pscl_1.5.2 truncnorm_1.0-8
[52] SQUAREM_2017.10-1 R.oo_1.21.0
This reproducible R Markdown analysis was created with workflowr 1.0.1