Last updated: 2019-01-22
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date 
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
 ✔ Environment: empty 
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
 ✔ Seed: 
set.seed(20180501) 
The command set.seed(20180501) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
 ✔ Session information: recorded 
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
 ✔ Repository version: 11f83fb 
wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    data/.DS_Store
Untracked files:
    Untracked:  analysis/chipexoeg.Rmd
    Untracked:  analysis/efsd.Rmd
    Untracked:  analysis/talk1011.Rmd
    Untracked:  data/chipexo_examples/
    Untracked:  data/chipseq_examples/
    Untracked:  talk.Rmd
    Untracked:  talk.pdf
Unstaged changes:
    Modified:   analysis/binomial.Rmd
    Modified:   analysis/fda.Rmd
    Modified:   analysis/index.Rmd
    Modified:   analysis/sigma.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes. 
| File | Version | Author | Date | Message | 
|---|---|---|---|---|
| Rmd | 11f83fb | Dongyue Xie | 2019-01-22 | wflow_publish(“analysis/r2b.Rmd”) | 
For the method used in these examples, see here
R function for shrinking adjusted R squared:
#'@param y,X: response and design matrix.
#'@output shrinked ajusted R squared
library(ashr)
R2Shrink=function(y,X,output='SR2'){
  if(is.null(dim(X))){p=1}else{p=ncol(X)}
  n=length(y)
  mod=lm(y~X,data=data.frame(y=y,X=X))
  mst=sum((y-mean(y))^2)/(n-1)
  mse=sum((y-fitted(mod))^2)/(n-p-1)
  mset=mse/mst
  fash=ashr::ash(log(mset),1,lik=ashr::lik_logF(df1=n-p-1,df2=n-1))
  R2s=1-exp(fash$result$PosteriorMean)
  if(output=='SR2'){
    return(R2s)
  }
  if(output=='all'){
    mod.sy=summary(mod)
    R2=mod.sy$r.squared
    R2a=mod.sy$adj.r.squared
    return(list(R2s=R2s,R2a=R2a,R2=R2))
  }
}
Let \(y=\mu+x_1\beta_1+x_2\beta_2+x_3\beta_3+N(0,I_n)\). Sample size \(n=100\).
For each of settings below, fit three models:
Compare shrinked, adjusted and original \(R^2\).
set.seed(12345)
n=100
p=3
X=matrix(rnorm(n*(p)),n,p)
beta=2*c(1,1,1)
y=X%*%beta+rnorm(n)
ra=R2Shrink(y,X[,1],output = 'all')
rb=R2Shrink(y,X[,1:2],output = 'all')
rc=R2Shrink(y,X,output = 'all')
rlt=rbind(unlist(ra),unlist(rb),unlist(rc))
colnames(rlt)=c('Shrinked R2','Adjusted R2','R2')
rownames(rlt)=c('a','b','c')
round(rlt,3)
  Shrinked R2 Adjusted R2    R2
a       0.314       0.465 0.470
b       0.740       0.742 0.747
c       0.935       0.935 0.937
set.seed(12345)
n=100
p=3
X=matrix(rnorm(n*(p)),n,p)
beta=2*c(1,1,0)
y=X%*%beta+rnorm(n)
ra=R2Shrink(y,X[,1],output = 'all')
rb=R2Shrink(y,X[,1:2],output = 'all')
rc=R2Shrink(y,X,output = 'all')
rlt=rbind(unlist(ra),unlist(rb),unlist(rc))
colnames(rlt)=c('Shrinked R2','Adjusted R2','R2')
rownames(rlt)=c('a','b','c')
round(rlt,3)
  Shrinked R2 Adjusted R2    R2
a       0.456       0.542 0.547
b       0.924       0.924 0.925
c       0.929       0.929 0.931
set.seed(12345)
n=100
p=3
X=matrix(rnorm(n*(p)),n,p)
beta=2*c(1,0,0)
y=X%*%beta+rnorm(n)
ra=R2Shrink(y,X[,1],output = 'all')
rb=R2Shrink(y,X[,1:2],output = 'all')
rc=R2Shrink(y,X,output = 'all')
rlt=rbind(unlist(ra),unlist(rb),unlist(rc))
colnames(rlt)=c('Shrinked R2','Adjusted R2','R2')
rownames(rlt)=c('a','b','c')
round(rlt,3)
  Shrinked R2 Adjusted R2    R2
a       0.855       0.857 0.858
b       0.857       0.858 0.861
c       0.867       0.868 0.872
set.seed(12345)
n=100
p=3
X=matrix(rnorm(n*(p)),n,p)
beta=c(0,0,0)
y=X%*%beta+rnorm(n)
ra=R2Shrink(y,X[,1],output = 'all')
rb=R2Shrink(y,X[,1:2],output = 'all')
rc=R2Shrink(y,X,output = 'all')
rlt=rbind(unlist(ra),unlist(rb),unlist(rc))
colnames(rlt)=c('Shrinked R2','Adjusted R2','R2')
rownames(rlt)=c('a','b','c')
round(rlt,3)
  Shrinked R2 Adjusted R2    R2
a           0       0.010 0.020
b           0       0.019 0.039
c           0       0.084 0.112
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
[1] ashr_2.2-7
loaded via a namespace (and not attached):
 [1] Rcpp_0.12.18      knitr_1.20        whisker_0.3-2    
 [4] magrittr_1.5      workflowr_1.1.1   REBayes_1.3      
 [7] MASS_7.3-51.1     pscl_1.5.2        doParallel_1.0.14
[10] SQUAREM_2017.10-1 lattice_0.20-35   foreach_1.4.4    
[13] stringr_1.3.1     tools_3.5.1       parallel_3.5.1   
[16] grid_3.5.1        R.oo_1.22.0       git2r_0.23.0     
[19] htmltools_0.3.6   iterators_1.0.10  assertthat_0.2.0 
[22] yaml_2.2.0        rprojroot_1.3-2   digest_0.6.17    
[25] Matrix_1.2-14     codetools_0.2-15  R.utils_2.7.0    
[28] evaluate_0.11     rmarkdown_1.10    stringi_1.2.4    
[31] compiler_3.5.1    Rmosek_8.0.69     backports_1.1.2  
[34] R.methodsS3_1.7.1 truncnorm_1.0-8  
This reproducible R Markdown analysis was created with workflowr 1.1.1