Last updated: 2019-01-27
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Environment: empty
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
✔ Seed:
set.seed(20180501)
The command set.seed(20180501)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
✔ Session information: recorded
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
✔ Repository version: a1a64c5
wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: data/.DS_Store
Untracked files:
Untracked: analysis/chipexoeg.Rmd
Untracked: analysis/efsd.Rmd
Untracked: analysis/talk1011.Rmd
Untracked: data/chipexo_examples/
Untracked: data/chipseq_examples/
Untracked: talk.Rmd
Untracked: talk.pdf
Unstaged changes:
Modified: analysis/binomial.Rmd
Modified: analysis/fda.Rmd
Modified: analysis/index.Rmd
Modified: analysis/r2.Rmd
Modified: analysis/sigma.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | a1a64c5 | Dongyue Xie | 2019-01-27 | wflow_publish(“analysis/r2b.Rmd”) |
html | 9ce257c | Dongyue Xie | 2019-01-27 | Build site. |
Rmd | d6bcb1a | Dongyue Xie | 2019-01-27 | wflow_publish(“analysis/r2b.Rmd”) |
html | 97b73fc | Dongyue Xie | 2019-01-22 | Build site. |
Rmd | b5029e2 | Dongyue Xie | 2019-01-22 | wflow_publish(“analysis/r2b.Rmd”) |
html | ad12c40 | Dongyue Xie | 2019-01-22 | Build site. |
Rmd | 11f83fb | Dongyue Xie | 2019-01-22 | wflow_publish(“analysis/r2b.Rmd”) |
For the method used in these examples, see here
True \(R^2\) is defined as \(R^2=\frac{var(X\beta)}{var(y)}=\frac{var(y)-\sigma^2}{var(y)}=1-\frac{\sigma^2}{var(y)}=1-\frac{\sigma^2}{\sigma^2+var(X\beta)}\)
Ajusted R^2: \(1-\frac{\sum(y_i-\hat y_i)^2/(n-p-1)}{\sum(y_i-\bar y)^2/(n-1)}\)
Shrink adjusted R^2: use fash shrinking \(fash.output=\log(\frac{\sum(y_i-\hat y_i)^2/(n-p-1)}{\sum(y_i-\bar y)^2/(n-1)})\) then shrunk adjusted R^2 is \(1-\exp(fash.output)\)
Shrink R^2 = \(1 - \exp(fash.output)*\frac{n-p-1}{n-1}\), because \(R^2=1-\frac{\sum(y_i-\hat y_i)^2}{\sum(y_i-\bar y)^2}=1-\frac{\sum(y_i-\hat y_i)^2/(n-p-1)}{\sum(y_i-\bar y)^2/(n-1)}*\frac{n-p-1}{n-1}=1-(1-adjR^2)*\frac{n-p-1}{n-1}\)
R function for shrinking adjusted R/ R squared:
library(ashr)
#'@param R2: R squared from linear regression model fit
#'@param n: sample size
#'@param p: number od covariates
#'@output shrinked R squared.
ash_r2=function(R2,n,p){
df1=n-p-1
df2=n-1
log.ratio=log((1-R2)/(df1)*(df2))
shrink.log.ratio=ash(log.ratio,1,lik=lik_logF(df1=df1,df2=df2))$result$PosteriorMean
shrinked.ar2=1-exp(shrink.log.ratio)
shrinked.r2=1-exp(shrink.log.ratio)*df1/df2
return(list(shrinked.r2=shrinked.r2,shrinked.ar2=shrinked.ar2))
}
Assume linear model \(y=X\beta+\epsilon\) where \(\epsilon\sim N(0,\sigma^2I)\)
set.seed(1234)
n=100
p=5
R2=c()
R2a=c()
trueR2=c()
beta.list=seq(0,1,length.out = 100)
X=matrix(runif(n*(p),0,2),n,p)
for (i in 1:length(beta.list)) {
beta=rep(beta.list[i],p)
y=X%*%beta+rnorm(n)
datax=data.frame(X=X,y=y)
mod=lm(y~X,datax)
mod.sy=summary(mod)
R2[i]=mod.sy$r.squared
R2a[i]=mod.sy$adj.r.squared
trueR2[i]=1-(1)/(1+var(X%*%beta))
}
R2s=ash_r2(R2,n,p)$shrinked.r2
R2as=ash_r2(R2,n,p)$shrinked.ar2
plot(beta.list,R2,ylim=range(c(R2,R2a,R2s,R2as,trueR2)),main='',xlab='beta',ylab='',type='l')
lines(beta.list,R2a,col=2)
lines(beta.list,R2as,col=3)
lines(beta.list,R2s,col=4)
lines(beta.list,trueR2,col='grey80')
abline(h=0,lty=2)
legend('topleft',c('R^2','Adjusted R^2','Adjusted R^2 fash','R^2 fash','True R^2'),lty=c(1,1,1,1,1),col=c(1,2,3,4,'grey80'))
Version | Author | Date |
---|---|---|
9ce257c | Dongyue Xie | 2019-01-27 |
plot(trueR2,R2,type='l',ylim=range(c(R2,R2a,R2s,R2as,trueR2)))
lines(trueR2,R2a,col=2)
lines(trueR2,R2as,col=3)
lines(trueR2,R2s,col=4)
lines(trueR2,trueR2,col='grey80')
legend('topleft',c('R^2','Adjusted R^2','Adjusted R^2 fash','R^2 fash'),lty=c(1,1,1,1),col=c(1,2,3,4))
Version | Author | Date |
---|---|---|
9ce257c | Dongyue Xie | 2019-01-27 |
set.seed(1234)
n=20
p=3
R2=c()
R2a=c()
trueR2=c()
beta.list=seq(0,1,length.out = 100)
X=matrix(runif(n*(p),0,2),n,p)
for (i in 1:length(beta.list)) {
beta=rep(beta.list[i],p)
y=X%*%beta+rnorm(n)
datax=data.frame(X=X,y=y)
mod=lm(y~X,datax)
mod.sy=summary(mod)
R2[i]=mod.sy$r.squared
R2a[i]=mod.sy$adj.r.squared
trueR2[i]=1-(1)/(1+var(X%*%beta))
}
R2s=ash_r2(R2,n,p)$shrinked.r2
R2as=ash_r2(R2,n,p)$shrinked.ar2
plot(beta.list,R2,ylim=range(c(R2,R2a,R2s,R2as,trueR2)),main='',xlab='beta',ylab='',type='l')
lines(beta.list,R2a,col=2)
lines(beta.list,R2as,col=3)
lines(beta.list,R2s,col=4)
lines(beta.list,trueR2,col='grey80')
abline(h=0,lty=2)
legend('topleft',c('R^2','Adjusted R^2','Adjusted R^2 fash','R^2 fash','True R^2'),lty=c(1,1,1,1,1),col=c(1,2,3,4,'grey80'))
Version | Author | Date |
---|---|---|
9ce257c | Dongyue Xie | 2019-01-27 |
plot(trueR2,R2,type='l',ylim=range(c(R2,R2a,R2s,R2as,trueR2)))
lines(trueR2,R2a,col=2)
lines(trueR2,R2as,col=3)
lines(trueR2,R2s,col=4)
lines(trueR2,trueR2,col='grey80')
legend('topleft',c('R^2','Adjusted R^2','Adjusted R^2 fash','R^2 fash'),lty=c(1,1,1,1),col=c(1,2,3,4))
Version | Author | Date |
---|---|---|
9ce257c | Dongyue Xie | 2019-01-27 |
When n is small, all \(R^2\) are shrunk to zero. Try to increase beta: from 0-1 to 0-2
set.seed(1234)
n=20
p=3
R2=c()
R2a=c()
trueR2=c()
beta.list=seq(0,2,length.out = 100)
X=matrix(runif(n*(p),0,2),n,p)
for (i in 1:length(beta.list)) {
beta=rep(beta.list[i],p)
y=X%*%beta+rnorm(n)
datax=data.frame(X=X,y=y)
mod=lm(y~X,datax)
mod.sy=summary(mod)
R2[i]=mod.sy$r.squared
R2a[i]=mod.sy$adj.r.squared
trueR2[i]=1-(1)/(1+var(X%*%beta))
}
R2s=ash_r2(R2,n,p)$shrinked.r2
R2as=ash_r2(R2,n,p)$shrinked.ar2
plot(beta.list,R2,ylim=range(c(R2,R2a,R2s,R2as,trueR2)),main='',xlab='beta',ylab='',type='l')
lines(beta.list,R2a,col=2)
lines(beta.list,R2as,col=3)
lines(beta.list,R2s,col=4)
lines(beta.list,trueR2,col='grey80')
abline(h=0,lty=2)
legend('topleft',c('R^2','Adjusted R^2','Adjusted R^2 fash','R^2 fash','True R^2'),lty=c(1,1,1,1,1),col=c(1,2,3,4,'grey80'))
Version | Author | Date |
---|---|---|
9ce257c | Dongyue Xie | 2019-01-27 |
plot(trueR2,R2,type='l',ylim=range(c(R2,R2a,R2s,R2as,trueR2)))
lines(trueR2,R2a,col=2)
lines(trueR2,R2as,col=3)
lines(trueR2,R2s,col=4)
lines(trueR2,trueR2,col='grey80')
legend('topleft',c('R^2','Adjusted R^2','Adjusted R^2 fash','R^2 fash'),lty=c(1,1,1,1),col=c(1,2,3,4))
Version | Author | Date |
---|---|---|
9ce257c | Dongyue Xie | 2019-01-27 |
n=100,p=1
library(CorShrink)
set.seed(1234)
n=100
p=1
R2=c()
#R2a=c()
trueR2=c()
beta.list=seq(0,1,length.out = 100)
X=matrix(runif(n*(p),0,2),n,p)
for (i in 1:length(beta.list)) {
beta=rep(beta.list[i],p)
y=X*beta+rnorm(n)
datax=data.frame(X=X,y=y)
mod=lm(y~X,datax)
mod.sy=summary(mod)
R2[i]=mod.sy$r.squared
R2a[i]=mod.sy$adj.r.squared
trueR2[i]=1-(1)/(1+var(X%*%beta))
}
R2.fash=ash_r2(R2,n,p)$shrinked.r2
R2.cor=(CorShrinkVector(sqrt(R2),n))^2
plot(beta.list,R2,ylim=range(c(R2.fash,R2.cor,R2,trueR2)),main='',xlab='beta',ylab='',type='l')
lines(beta.list,R2.cor,col=2)
lines(beta.list,R2.fash,col=4)
lines(beta.list,trueR2,col='grey80')
abline(h=0,lty=2)
legend('topleft',c('R2','R^2 fash','R^2 CorShrink','True R^2'),lty=c(1,1,1,1),col=c(1,4,2,'grey80'))
Version | Author | Date |
---|---|---|
9ce257c | Dongyue Xie | 2019-01-27 |
plot(trueR2,R2.fash,type='l',ylim=range(c(R2.fash,R2.cor,trueR2)),ylab='')
lines(trueR2,R2.cor,col=2)
lines(trueR2,trueR2,col='grey80')
legend('topleft',c('R^2 fash','R^2 CorShrink'),lty=c(1,1),col=c(1,2))
Version | Author | Date |
---|---|---|
9ce257c | Dongyue Xie | 2019-01-27 |
n=20,p=1
set.seed(1234)
n=20
p=1
R2=c()
#R2a=c()
trueR2=c()
beta.list=seq(0,1,length.out = 100)
X=matrix(runif(n*(p),0,2),n,p)
for (i in 1:length(beta.list)) {
beta=rep(beta.list[i],p)
y=X*beta+rnorm(n)
datax=data.frame(X=X,y=y)
mod=lm(y~X,datax)
mod.sy=summary(mod)
R2[i]=mod.sy$r.squared
R2a[i]=mod.sy$adj.r.squared
trueR2[i]=1-(1)/(1+var(X%*%beta))
}
R2.fash=ash_r2(R2,n,p)$shrinked.r2
R2.cor=(CorShrinkVector(sqrt(R2),n))^2
plot(beta.list,R2.fash,ylim=range(c(R2.fash,R2.cor,trueR2)),main='',xlab='beta',ylab='',type='l')
lines(beta.list,R2.cor,col=2)
lines(beta.list,trueR2,col='grey80')
abline(h=0,lty=2)
legend('topleft',c('R^2 fash','R^2 CorShrink','True R^2'),lty=c(1,1,1),col=c(1,2,'grey80'))
Version | Author | Date |
---|---|---|
9ce257c | Dongyue Xie | 2019-01-27 |
plot(trueR2,R2.fash,type='l',ylim=range(c(R2.fash,R2.cor,trueR2)),ylab='')
lines(trueR2,R2.cor,col=2)
lines(trueR2,trueR2,col='grey80')
legend('topleft',c('R^2 fash','R^2 CorShrink'),lty=c(1,1),col=c(1,2))
Version | Author | Date |
---|---|---|
9ce257c | Dongyue Xie | 2019-01-27 |
n=100, p=5
library(CorShrink)
set.seed(1234)
n=100
p=5
R2=c()
#R2a=c()
trueR2=c()
beta.list=seq(0,1,length.out = 100)
X=matrix(runif(n*(p),0,2),n,p)
for (i in 1:length(beta.list)) {
beta=rep(beta.list[i],p)
y=X%*%beta+rnorm(n)
datax=data.frame(X=X,y=y)
mod=lm(y~X,datax)
mod.sy=summary(mod)
R2[i]=mod.sy$r.squared
R2a[i]=mod.sy$adj.r.squared
trueR2[i]=1-(1)/(1+var(X%*%beta))
}
R2.fash=ash_r2(R2,n,p)$shrinked.r2
R2.cor=(CorShrinkVector(sqrt(R2),n))^2
plot(beta.list,R2.fash,ylim=range(c(R2.fash,R2.cor,trueR2)),main='',xlab='beta',ylab='',type='l')
lines(beta.list,R2.cor,col=2)
lines(beta.list,trueR2,col='grey80')
abline(h=0,lty=2)
legend('topleft',c('R^2 fash','R^2 CorShrink','True R^2'),lty=c(1,1,1),col=c(1,2,'grey80'))
plot(trueR2,R2.fash,type='l',ylim=range(c(R2.fash,R2.cor,trueR2)),ylab='')
lines(trueR2,R2.cor,col=2)
lines(trueR2,trueR2,col='grey80')
legend('bottomright',c('R^2 fash','R^2 CorShrink'),lty=c(1,1),col=c(1,2))
n=20, p=3
set.seed(1234)
n=20
p=3
R2=c()
#R2a=c()
trueR2=c()
beta.list=seq(0,1,length.out = 100)
X=matrix(runif(n*(p),0,2),n,p)
for (i in 1:length(beta.list)) {
beta=rep(beta.list[i],p)
y=X%*%beta+rnorm(n)
datax=data.frame(X=X,y=y)
mod=lm(y~X,datax)
mod.sy=summary(mod)
R2[i]=mod.sy$r.squared
R2a[i]=mod.sy$adj.r.squared
trueR2[i]=1-(1)/(1+var(X%*%beta))
}
R2.fash=ash_r2(R2,n,p)$shrinked.r2
R2.cor=(CorShrinkVector(sqrt(R2),n))^2
plot(beta.list,R2.fash,ylim=range(c(R2.fash,R2.cor,trueR2)),main='',xlab='beta',ylab='',type='l')
lines(beta.list,R2.cor,col=2)
lines(beta.list,trueR2,col='grey80')
abline(h=0,lty=2)
legend('topleft',c('R^2 fash','R^2 CorShrink','True R^2'),lty=c(1,1,1),col=c(1,2,'grey80'))
plot(trueR2,R2.fash,type='l',ylim=range(c(R2.fash,R2.cor,trueR2)),ylab='')
lines(trueR2,R2.cor,col=2)
lines(trueR2,trueR2,col='grey80')
legend('bottomright',c('R^2 fash','R^2 CorShrink'),lty=c(1,1),col=c(1,2))
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] CorShrink_0.1-6 ashr_2.2-7
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 plyr_1.8.4 compiler_3.5.1
[4] git2r_0.23.0 workflowr_1.1.1 R.methodsS3_1.7.1
[7] R.utils_2.7.0 iterators_1.0.10 tools_3.5.1
[10] digest_0.6.17 corrplot_0.84 evaluate_0.11
[13] gtable_0.2.0 lattice_0.20-35 Matrix_1.2-14
[16] foreach_1.4.4 yaml_2.2.0 parallel_3.5.1
[19] gridExtra_2.3 stringr_1.3.1 knitr_1.20
[22] REBayes_1.3 rprojroot_1.3-2 grid_3.5.1
[25] glmnet_2.0-16 rmarkdown_1.10 reshape2_1.4.3
[28] corpcor_1.6.9 magrittr_1.5 whisker_0.3-2
[31] backports_1.1.2 codetools_0.2-15 htmltools_0.3.6
[34] MASS_7.3-51.1 assertthat_0.2.0 stringi_1.2.4
[37] Rmosek_8.0.69 doParallel_1.0.14 pscl_1.5.2
[40] truncnorm_1.0-8 SQUAREM_2017.10-1 R.oo_1.22.0
This reproducible R Markdown analysis was created with workflowr 1.1.1