Last updated: 2017-06-07
Code version: a08b608
Let \(L_{n \times k} = \left[L_{ij}\right]_{n \times k}\) be a matrix, each entry of which is generated as follows.
Then taking \(L\) as known, let \(x \sim N\left(0, I_k\right)\) be a \(k\)-dimensional vector comprised of \(k\) independent \(N\left(0, 1\right)\) random variables. Then \[ z = Lx \sim N\left(0, LL^T\right) \] should be \(n\) marginally \(N\left(0, 1\right)\) but correlated \(z\) scores. Indeed, \[ \begin{array}{c} \text{var}\left(z_i\right) = l_i^Tl_i = 1 \ ; \\ \text{cov}\left(z_i, z_j\right) = l_i^Tl_j \neq 0 \text{, in general} \ ; \end{array} \] where \(l_i^T\) and \(l_j^T\) are \(i^\text{th}\) and \(j^\text{th}\) rows of \(L\) respectively.
Then we plot the histogram of \(n\) \(z\) scores. One interesting thing is we can prove what these histograms would look like when \(n\) is sufficiently large.
For example, when \(k = 4\), \(n\) is sufficiently large, say, \(10^6\), the histogram of \(z\) looks like a semicircle almost perfectly, as illustrated in the following simulation. The semicircle is centered at the origin, and has a radius of \(\left\|x\right\|_2\).
set.seed(1)
n = 1e6
k = 4
L = matrix(rnorm(n * k), nrow = n)
s = sqrt(rowSums(L^2))
L = L / s
x = rnorm(k)
z = L %*% x
hist(z, breaks = 100, prob = TRUE)
R = sqrt(sum(x^2))
x.plot = seq(-max(abs(z)) - 1, max(abs(z)) + 1, length = 1000)
y.plot = 2 * sqrt(pmax(R^2 - x.plot^2, 0)) / (pi * R^2)
lines(x.plot, y.plot, col = "red")
Actually, when \(k \neq 4\), for example, \(k = 3\) or \(k = 5\), the histograms of these correlated \(z\) scores, simulated the same way, look different, and their shapes when \(n \to \infty\) can be mathematically determined.
set.seed(1)
n = 1e6
k = 3
L = matrix(rnorm(n * k), nrow = n)
s = sqrt(rowSums(L^2))
L = L / s
x = rnorm(k)
z = L %*% x
hist(z, breaks = 100, prob = TRUE)
set.seed(1)
n = 1e6
k = 5
L = matrix(rnorm(n * k), nrow = n)
s = sqrt(rowSums(L^2))
L = L / s
x = rnorm(k)
z = L %*% x
hist(z, breaks = 100, prob = TRUE)
sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.5
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] backports_1.0.5 magrittr_1.5 rprojroot_1.2 tools_3.3.3
[5] htmltools_0.3.6 yaml_2.1.14 Rcpp_0.12.11 stringi_1.1.2
[9] rmarkdown_1.5 knitr_1.16 git2r_0.18.0 stringr_1.2.0
[13] digest_0.6.12 evaluate_0.10
This R Markdown site was created with workflowr