When only the most extreme observation is known

Last updated: 2017-02-27

Code version: 0a13d52

Introducation

Matthew had an idea that what if the only thing we know is the most extreme observation \((\hat\beta_{(n)}, \hat s_{(n)})\), as well as the total number of observations \(n\). What does this single data point tell us?

Model

Start with our usual ash model.

\[ \begin{array}{c} \hat\beta_j | \hat s_j, \beta_j \sim N(\beta_j, \hat s_j^2)\\ \beta_j \sim \sum_k\pi_k N(0, \sigma_k^2) \end{array} \] Now we only observe \((\hat\beta_{(n)}, \hat s_{(n)})\) with the information that \(|\hat\beta_{(n)}/\hat s_{(n)}| \geq |\hat\beta_{j}/\hat s_{j}|\), \(j = 1, \ldots, n\). This is essentially separating \(n\) observations into two groups.

\[ \text{Group 1: }(\hat\beta_{(1)}, \hat s_{(1)}), \ldots, (\hat\beta_{(n - 1)}, \hat s_{(n - 1)}), \text{ with } |\hat\beta_j/\hat s_j| \leq t = |\hat\beta_{(n)}/\hat s_{(n)}| \] \[ \text{Group 2: }(\hat\beta_{n}, \hat s_{n}), \text{ with } |\hat\beta_{(n)}/\hat s_{(n)}| = t \] Or in other words, it should be equivalent to truncash using the threshold \(t = |\hat\beta_{(n)}/\hat s_{(n)}|\), at least from the likelihood principle point of view.

Back-of-the-envelope calculation

Suppose \(X_1 \sim F_1, X_2\sim F_2, \ldots, X_n \sim F_n\), with \(F_i\) being the cdf of the random variable \(X_i\), with a pdf \(f_i\). In ash’s setting, we can think of \(X_i = |\hat\beta_i/ \hat s_i|\), and \(f_i\) is the convolution of a common unimodel distribution \(g\) (to be estimated) and the idiosyncratic likelihood of \(|\hat\beta_j / \hat s_j|\) given \(\hat s_j\) (usually related to normal or Student’s t, but could be generalized to others). Let \(X_{(n)}:=\max\{X_1, X_2, \ldots, X_n\}\), the extreme value of these \(n\) random variables.

\[ \begin{array}{rl} & P(X_{(n)} \leq t) = \Pi_{i = 1}^n F_i(t) \\ \Rightarrow & p_{X_{(n)}}(t) = dP(X_{(n)} \leq t)/dt = \end{array} \]

Simulation

Session Information

sessionInfo()

R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.3

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] backports_1.0.5 magrittr_1.5    rprojroot_1.2   tools_3.3.2    
 [5] htmltools_0.3.5 yaml_2.1.14     Rcpp_0.12.9     stringi_1.1.2  
 [9] rmarkdown_1.3   knitr_1.15.1    git2r_0.18.0    stringr_1.1.0  
[13] digest_0.6.11   workflowr_0.3.0 evaluate_0.10

This R Markdown site was created with workflowr