Last updated: 2018-02-08

Code version: 1a2b7bf

Introduction

In the Knockoff paper simulations, the columns of \(X\) are either independent or simulated from a Toeplitz correlation where \(Cor(X_i, X_j) = \rho^{|i - j|}\). Here we are replicating the independent results, and investigating how well Knockoff deal with other correlation structures.

In the following simulations, we always have \(n = 3000\), \(p = 1000\). Out of \(p = 1000\) \(\beta_j\)’s, \(950\) of them are zeroes, and the rest \(k = 50\) signals all have \(\beta_j = A\). For a certain \(X\), \(Y_n \sim N(X_{n\times p}\beta_p, I_n)\). Here are three scenarios to generate the columns of \(X_{n \times p}\). All simulations use \(q = 0.1\) cutoff.

n <- 3000
p <- 1000
k <- 50
q <- 0.1
  • Scenario 1: Each row of \(X\) are independently drawn from \(N(0, I_p)\). All columns of \(X\) are normalized such that \(\|X_j\|_2^2 = 1\). The signal magnitude \(A = 3.5\).
  • Scenario 2: Each row of \(X\) are independently drawn from \(N(0, \Sigma_X)\), where \(\Sigma_X = \texttt{cov2cor}(B_{p \times d}B_{d\times p}^T + I)\), \(d = 5\). All columns of \(X\) are normalized such that \(\|X_j\|_2^2 = 1\). The signal magnitude \(A = 9\) to make sure that the signal is significantly stronger than the noise level \(\text{SE}(\hat\beta) = \sqrt{\text{diag}[(X^TX)^{-1}]}\) . In this case the columns of \(X\) will have substantial average correlation, but not necessarily so for \(\hat\beta\).
  • Scenario 3: Each row of \(X\) (with normalization) are independently drawn from \(N(0, \Sigma_{\hat\beta}^{-1})\), where \(\Sigma_{\hat\beta} = \texttt{cov2cor}(B_{p \times d}B_{d\times p}^T + I)\), \(d = 5\). We’ll have \((X^TX)^{-1}\approx\Sigma_{\hat\beta}\). In this case \(\hat\beta_j\)’s will have substantial average correlation. The signal magnitude \(A = 3.5\).

Scenario 1: Independent \(X\) columns

set.seed(777)
## Independent columns
X <- matrix(rnorm(n * p), n , p)
## Normalization
X <- t(t(X) / sqrt(colSums(X^2)))
## Generate knockoffs
Xk <- knockoff::create.fixed(X)
Xk <- Xk$Xk
## Average sebetahat
sqrt(mean(diag(solve(crossprod(X)))))
[1] 1.224379
## Signal strength
A <- 3.5
## Set beta
beta <- rep(0, p)
nonzero <- sample(p, k)
beta[nonzero] <- A

Scenario 2: \(X\) from a factor model

set.seed(777)
## Generate correlation matrix of X
d <- 5
B <- matrix(rnorm(p * d, 0, 1), p, d)
Sigma.X <- tcrossprod(B) + diag(p)
Rho.X <- cov2cor(Sigma.X)
## Simulate X
X <- matrix(rnorm(n * p), n, p) %*% chol(Rho.X)
## Normalization
X <- t(t(X) / sqrt(colSums(X^2)))
## Generate knockoffs
Xk <- knockoff::create.fixed(X)
Xk <- Xk$Xk
## Average sebetahat
sqrt(mean(diag(solve(crossprod(X)))))
[1] 3.015966
## Signal strength
A <- 9
## Set beta
beta <- rep(0, p)
nonzero <- sample(p, k)
beta[nonzero] <- A

Scenario 3: \(\hat\beta\) from a factor model

set.seed(777)
## Generate correlation matrix of betahat
d <- 5
B <- matrix(rnorm(p * d, 0, 1), p, d)
Sigma.betahat <- tcrossprod(B) + diag(p)
Cor.betahat <- cov2cor(Sigma.betahat)
## Simulate X with independent columns
X <- matrix(rnorm(n * p), n, p)
## Normalize X
X <- t(t(X) / sqrt(colSums(X^2)))
## Transform independent columns to have Sigma_betahat^{-1} correlation structure
X <- X %*% chol(solve(Cor.betahat))
## Generate knockoffs
Xk <- knockoff::create.fixed(X)
Xk <- Xk$Xk
## Average sebetahat
sqrt(mean(diag(solve(crossprod(X)))))
[1] 1.227621
## Signal strength
A <- 3.5
## Set beta
beta <- rep(0, p)
nonzero <- sample(p, k)
beta[nonzero] <- A
FDP.BH FDP.Knockoff FDP.Knockoff.Plus Power.BH Power.Knockoff Power.Knockoff.Plus
Independent Columns 0.0936 0.0677 0.0431 0.4277 0.5400 0.4076
Factor Model for X 0.0956 0.4223 0.0000 0.6241 0.0066 0.0000
Factor Model for betahat 0.0755 0.3853 0.1986 0.4024 0.1322 0.0694

Comparison across correlation levels (various \(d\))

Scenario 2: \(X\) from a factor model

Scenario 3: \(\hat\beta\) from a factor model

Session information

sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.2

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_2.2.1  knitr_1.19     knockoff_0.3.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.14     magrittr_1.5     munsell_0.4.3    colorspace_1.3-2
 [5] rlang_0.1.6      stringr_1.2.0    highr_0.6        plyr_1.8.4      
 [9] tools_3.4.3      grid_3.4.3       gtable_0.2.0     git2r_0.21.0    
[13] htmltools_0.3.6  yaml_2.1.16      lazyeval_0.2.1   rprojroot_1.3-2 
[17] digest_0.6.14    tibble_1.4.1     evaluate_0.10.1  rmarkdown_1.8   
[21] labeling_0.3     stringi_1.1.6    compiler_3.4.3   pillar_1.0.1    
[25] scales_0.5.0     backports_1.1.2 

This R Markdown site was created with workflowr