Last updated: 2018-07-20
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Repository version: 8911eb8
wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: docs/.DS_Store
Ignored: docs/figure/.DS_Store
Unstaged changes:
Modified: analysis/index.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 8911eb8 | Jason Willwerscheid | 2018-07-20 | wflow_publish(c(“analysis/obj_notes.Rmd”, |
html | de6736c | Jason Willwerscheid | 2018-07-18 | Build site. |
Rmd | 4d6f3da | Jason Willwerscheid | 2018-07-18 | wflow_publish(“analysis/obj_notes.Rmd”) |
html | f34a82b | Jason Willwerscheid | 2018-07-18 | Build site. |
Rmd | 6c2cefd | Jason Willwerscheid | 2018-07-18 | wflow_publish(“analysis/obj_notes.Rmd”) |
html | cc77169 | Jason Willwerscheid | 2018-07-17 | Build site. |
Rmd | 87f4d40 | Jason Willwerscheid | 2018-07-17 | wflow_publish(“analysis/obj_notes.Rmd”) |
html | e77fc4c | Jason Willwerscheid | 2018-07-17 | Build site. |
Rmd | eb42402 | Jason Willwerscheid | 2018-07-17 | wflow_publish(“analysis/obj_notes.Rmd”) |
html | 9678634 | Jason Willwerscheid | 2018-07-17 | Build site. |
Rmd | 3e874c6 | Jason Willwerscheid | 2018-07-17 | wflow_publish(“analysis/obj_notes.Rmd”) |
html | 02765a9 | Jason Willwerscheid | 2018-07-17 | Build site. |
Rmd | e811943 | Jason Willwerscheid | 2018-07-17 | wflow_publish(“analysis/obj_notes.Rmd”) |
html | 0621185 | Jason Willwerscheid | 2018-07-17 | Build site. |
Rmd | a0bccf9 | Jason Willwerscheid | 2018-07-17 | wflow_publish(“analysis/obj_notes.Rmd”) |
Recall the FLASH model: \[ Y = LF' + E \]
When updating loading \(l_k\), we are optimizing over \(g_{l_k}\) and \(q_{l_k}\). \(g_{l_k} \in \mathcal{G}\) is the prior on the elements of the \(k\)th column of the loadings matrix: \[ l_{1k}, \ldots, l_{nk} \sim^{iid} g_{l_k} \] \(q_{l_k}\) is an arbitrary distribution which enters into the problem via the variational approach. For convenience, I drop the subscripts in the following.
The part of the objective that depends on \(g\) and \(q\) is \[ F(g, q) := E_q \left[ -\frac{1}{2} \sum_i (A_i l_i^2 - 2 B_i l_i) \right] + E_q \log \frac{g(\mathbf{l})}{q(\mathbf{l})} \] with \[ A_i = \sum_j \tau_{ij} Ef^2_j \text{ and } B_i = \sum_j \tau_{ij} R_{ij} Ef_j, \] (\(R\) is the matrix of residuals (excluding factor \(k\)) and \(Ef_j\) and \(Ef^2_j\) are the expected values of \(f_{jk}\) and \(f_{jk}^2\) with respect to the distribution \(q_{f_k}\) fitted during the factor update.)
As Lemma 2 in the paper shows (see Appendix A.2), this expression is optimized by setting \(s_j^2 = A_j\) and \(x_j = B_j s_j^2\), and then solving the EBNM problem, where the EBNM model is: \[ \mathbf{x} = \mathbf{\theta} + \mathbf{e},\ \theta_1, \ldots, \theta_n \sim^{iid} g,\ e_j \sim N(0, s_j^2) \]
Solving the EBNM problem gives \[\hat{g} = {\arg \max}_g\ p(x \mid g) \] and \[ \hat{q} = p(\theta \mid x, \hat{g}) \]
Finally, to update the overall objective, we need to compute \(E_q \log \frac{g(\mathbf{l})}{q(\mathbf{l})}\). FLASH uses a clever trick, noticing that \[ E_{\hat{q}} \log \frac{\hat{g}(\mathbf{l})}{\hat{q}(\mathbf{l})} = F(\hat{g}, \hat{q}) + \frac{1}{2} \sum_j \left[ \log 2\pi s_j^2 + (1/s_j^2) E_{\hat{q}} (x_j - \theta_j)^2 \right] \] (See Appendix A.4.)
When using ebnm_pn
, however, it seems possible to compute \(E_q \log \frac{g(\mathbf{l})}{q(\mathbf{l})}\) directly. Since the elements \(l_1, \ldots, l_n\) are i.i.d. from \(g\) (by the FLASH model) and the posterior distributions are mutually independent (by the EBNM model), \[ E_q \log \frac{g(\mathbf{l})}{q(\mathbf{l})}
= \sum_i E_{q_i} \log \frac{g(l_i)}{q(l_i)} \]
I drop the subscripts \(i\). Write \[ g \sim \pi_0 \delta_0 + (1 - \pi_0) N(0, 1/a) \] and \[ q \sim (1 - \tilde{w}) \delta_0 + \tilde{w} N(\tilde{\mu}, \tilde{\sigma}^2) \] (I use different parametrizations to make the derivation cleaner and to follow the code more closely.)
Then \[\begin{aligned} E_q \log \frac{g(l)}{q(l)} &= (1 - \tilde{w}) \log \frac{\pi_0}{1 - \tilde{w}} + \int \tilde{w}\ \text{dnorm}(x; \tilde{\mu}, \tilde{\sigma}^2) \log \frac{(1 - \pi_0)\text{dnorm}(x; 0, 1/a)} {\tilde{w}\ \text{dnorm}(x; \tilde{\mu}, \tilde{\sigma}^2)}\ dx \\ &= (1 - \tilde{w}) \log \frac{\pi_0}{1 - \tilde{w}} + \tilde{w} \log \frac{1 - \pi_0}{\tilde{w}} \\ &\ + \int \tilde{w}\ \text{dnorm}(x; \tilde{\mu}, \tilde{\sigma}^2) \log \left( \sqrt{a \tilde{\sigma}^2} \exp \left( -\frac{ax^2}{2} + \frac{(x - \tilde{\mu})^2}{2 \tilde{\sigma}^2} \right) \right) \ dx \end{aligned}\]
The last integral is equal to \[\begin{aligned} \frac{\tilde{w}}{2} \log (a \tilde{\sigma}^2) &- \frac{\tilde{w} a}{2} E_{N(x; \tilde{\mu}, \tilde{\sigma}^2)} x^2 + \frac{\tilde{w}}{2 \tilde{\sigma}^2} E_{N(x; \tilde{\mu}, \tilde{\sigma}^2)} (x - \tilde{\mu})^2 \\ &= \frac{\tilde{w}}{2} \log (a \tilde{\sigma}^2) - \frac{\tilde{w} a}{2} (\tilde{\mu}^2 + \tilde{\sigma}^2) + \frac{\tilde{w}}{2} \end{aligned}\]
Thus \[ E_q \log \frac{g(l)}{q(l)} = (1 - \tilde{w}) \log \frac{\pi_0}{1 - \tilde{w}} + \tilde{w} \log \frac{1 - \pi_0}{\tilde{w}} + \frac{\tilde{w}}{2} \left( \log (a \tilde{\sigma}^2) - a(\tilde{\mu}^2 + \tilde{\sigma}^2) + 1 \right) \]
This reproducible R Markdown analysis was created with workflowr 1.0.1