Last updated: 2018-04-19
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Environment: empty
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
✔ Seed:
set.seed(20180411)
The command set.seed(20180411)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
✔ Session information: recorded
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
✔ Repository version: bb95415
wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rproj.user/
Untracked files:
Untracked: analysis/pca_cell_cycle.Rmd
Untracked: analysis/ridge_mle.Rmd
Untracked: data/zip.test.txt
Untracked: data/zip.train.txt
Untracked: docs/figure/
Untracked: svd_flash_zip.Rmd
Unstaged changes:
Modified: analysis/cell_cycle.Rmd
Modified: analysis/density_est_cell_cycle.Rmd
Modified: analysis/eb_vs_soft.Rmd
Modified: analysis/eight_schools.Rmd
Modified: analysis/glmnet_intro.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | bb95415 | stephens999 | 2018-04-19 | wflow_publish(“analysis/svd_zip.Rmd”) |
Here we read in the “zipcode” training data, and extract the 2s and 3s.
z = read.table("data/zip.train.txt")
sub = (z[,1] == 2) | (z[,1]==3)
z23 = as.matrix(z[sub,])
Now we run svd (excluding the first column which are the labels)
z23.svd = svd(z23[,-1])
Plot the first two two singular vectors, colored by group, we see the second sv separates the groups reasonably well.
plot(z23.svd$u[,1],z23.svd$u[,2],col=z23[,1])
And a histogram suggests a mixture of two Gaussians might be a reasonable start:
hist(z23.svd$u[,2],breaks=seq(-0.07,0.07,length=20))
sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X El Capitan 10.11.6
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] workflowr_0.11.0.9000 Rcpp_0.12.14 digest_0.6.13
[4] rprojroot_1.3-2 R.methodsS3_1.7.1 backports_1.1.2
[7] git2r_0.21.0 magrittr_1.5 evaluate_0.10.1
[10] stringi_1.1.6 whisker_0.3-2 R.oo_1.21.0
[13] R.utils_2.6.0 rmarkdown_1.8 tools_3.3.2
[16] stringr_1.2.0 yaml_2.1.16 htmltools_0.3.6
[19] knitr_1.18
This reproducible R Markdown analysis was created with workflowr 0.11.0.9000