<!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta charset="utf-8" /> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <meta name="generator" content="pandoc" /> <meta name="author" content="Sarah Urbut, Gao Wang, Peter Carbonetto and Matthew Stephens" /> <title>GTEx SNP-gene association statistics used in mash analysis</title> <script src="site_libs/jquery-1.11.3/jquery.min.js"></script> <meta name="viewport" content="width=device-width, initial-scale=1" /> <link href="site_libs/bootstrap-3.3.5/css/readable.min.css" rel="stylesheet" /> <script src="site_libs/bootstrap-3.3.5/js/bootstrap.min.js"></script> <script src="site_libs/bootstrap-3.3.5/shim/html5shiv.min.js"></script> <script src="site_libs/bootstrap-3.3.5/shim/respond.min.js"></script> <script src="site_libs/navigation-1.1/tabsets.js"></script> <link href="site_libs/highlightjs-9.12.0/textmate.css" rel="stylesheet" /> <script src="site_libs/highlightjs-9.12.0/highlight.js"></script> <style type="text/css">code{white-space: pre;}</style> <style type="text/css"> pre:not([class]) { background-color: white; } </style> <script type="text/javascript"> if (window.hljs) { hljs.configure({languages: []}); hljs.initHighlightingOnLoad(); if (document.readyState && document.readyState === "complete") { window.setTimeout(function() { hljs.initHighlighting(); }, 0); } } </script> <style type="text/css"> h1 { font-size: 34px; } h1.title { font-size: 38px; } h2 { font-size: 30px; } h3 { font-size: 24px; } h4 { font-size: 18px; } h5 { font-size: 16px; } h6 { font-size: 12px; } .table th:not([align]) { text-align: left; } </style> </head> <body> <style type = "text/css"> .main-container { max-width: 940px; margin-left: auto; margin-right: auto; } code { color: inherit; background-color: rgba(0, 0, 0, 0.04); } img { max-width:100%; height: auto; } .tabbed-pane { padding-top: 12px; } button.code-folding-btn:focus { outline: none; } </style> <style type="text/css"> /* padding for bootstrap navbar */ body { padding-top: 51px; padding-bottom: 40px; } /* offset scroll position for anchor links (for fixed navbar) */ .section h1 { padding-top: 56px; margin-top: -56px; } .section h2 { padding-top: 56px; margin-top: -56px; } .section h3 { padding-top: 56px; margin-top: -56px; } .section h4 { padding-top: 56px; margin-top: -56px; } .section h5 { padding-top: 56px; margin-top: -56px; } .section h6 { padding-top: 56px; margin-top: -56px; } </style> <script> // manage active state of menu based on current page $(document).ready(function () { // active menu anchor href = window.location.pathname href = href.substr(href.lastIndexOf('/') + 1) if (href === "") href = "index.html"; var menuAnchor = $('a[href="' + href + '"]'); // mark it active menuAnchor.parent().addClass('active'); // if it's got a parent navbar menu mark it active as well menuAnchor.closest('li.dropdown').addClass('active'); }); </script> <div class="container-fluid main-container"> <!-- tabsets --> <script> $(document).ready(function () { window.buildTabsets("TOC"); }); </script> <!-- code folding --> <div class="navbar navbar-default navbar-fixed-top" role="navigation"> <div class="container"> <div class="navbar-header"> <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar"> <span class="icon-bar"></span> <span class="icon-bar"></span> <span class="icon-bar"></span> </button> <a class="navbar-brand" href="index.html">mash code resources</a> </div> <div id="navbar" class="navbar-collapse collapse"> <ul class="nav navbar-nav"> <li> <a href="index.html">Overview</a> </li> <li> <a href="https://github.com/stephenslab/mashr">mashr</a> </li> <li> <a href="gtexdata.html">GTEx data</a> </li> <li> <a href="gtexanalysis.html">GTEx analysis</a> </li> <li> <a href="fastqtl2mash.html">Fastqtl to mash</a> </li> </ul> <ul class="nav navbar-nav navbar-right"> <li> <a href="https://github.com/stephenslab/gtexresults">source</a> </li> </ul> </div><!--/.nav-collapse --> </div><!--/.container --> </div><!--/.navbar --> <!-- Add a small amount of space between sections. --> <style type="text/css"> div.section { padding-top: 12px; } </style> <!-- Add a small amount of space between sections. --> <style type="text/css"> div.section { padding-top: 12px; } </style> <div class="fluid-row" id="header"> <h1 class="title toc-ignore">GTEx SNP-gene association statistics used in mash analysis</h1> <h4 class="author"><em>Sarah Urbut, Gao Wang, Peter Carbonetto and Matthew Stephens</em></h4> </div> <p><strong>Last updated:</strong> 2018-06-21</p> <strong>workflowr checks:</strong> <small>(Click a bullet for more information)</small> <ul> <li> <p><details> <summary> <strong style="color:blue;">✔</strong> <strong>R Markdown file:</strong> up-to-date </summary></p> <p>Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.</p> </details> </li> <li> <p><details> <summary> <strong style="color:blue;">✔</strong> <strong>Environment:</strong> empty </summary></p> <p>Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.</p> </details> </li> <li> <p><details> <summary> <strong style="color:blue;">✔</strong> <strong>Seed:</strong> <code>set.seed(1)</code> </summary></p> <p>The command <code>set.seed(1)</code> was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.</p> </details> </li> <li> <p><details> <summary> <strong style="color:blue;">✔</strong> <strong>Session information:</strong> recorded </summary></p> <p>Great job! Recording the operating system, R version, and package versions is critical for reproducibility.</p> </details> </li> <li> <p><details> <summary> <strong style="color:blue;">✔</strong> <strong>Repository version:</strong> <a href="https://github.com/stephenslab/gtexresults/tree/f224917fc0ba4ca188f24ba94719389dac27cd20" target="_blank">f224917</a> </summary></p> Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated. <br><br> Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use <code>wflow_publish</code> or <code>wflow_git_commit</code>). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated: <pre><code> Ignored files: Ignored: output/bmaonlybetasd5lfsr.txt.gz Ignored: output/bmaonlybetasd5posterior.means.txt.gz Ignored: output/independentsim.rds Ignored: output/independentsimesd05.rds Ignored: output/indsim05sdlfsr.txt.gz Ignored: output/indsim05sdposterior.means.txt.gz Ignored: output/noashsharedwithzerobmaalllfsr.txt.gz Ignored: output/noashsharedwithzerobmaallposterior.means.txt.gz Ignored: output/sharedashcutoffomega2jun15lfsr.txt.gz Ignored: output/sharedashcutoffomega2jun15posterior.means.txt.gz Ignored: output/simdata.rds Ignored: output/univariate.ash.lfsr.txt.gz Ignored: output/univariate.ash.pm.txt.gz Ignored: output/univariate.ash.pmindesd.txt.gz Ignored: output/univariate.ashind.lfsresd.txt.gz Unstaged changes: Modified: analysis/_site.yml </code></pre> Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes. </details> </li> </ul> <details> <summary> <small><strong>Expand here to see past versions:</strong></small> </summary> <ul> <table style="border-collapse:separate; border-spacing:5px;"> <thead> <tr> <th style="text-align:left;"> File </th> <th style="text-align:left;"> Version </th> <th style="text-align:left;"> Author </th> <th style="text-align:left;"> Date </th> <th style="text-align:left;"> Message </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Rmd </td> <td style="text-align:left;"> <a href="https://github.com/stephenslab/gtexresults/blob/f224917fc0ba4ca188f24ba94719389dac27cd20/analysis/gtexdata.Rmd" target="_blank">f224917</a> </td> <td style="text-align:left;"> Peter Carbonetto </td> <td style="text-align:left;"> 2018-06-21 </td> <td style="text-align:left;"> wflow_publish(c(“index.Rmd”, “gtexanalysis.Rmd”, “gtexdata.Rmd”)) </td> </tr> <tr> <td style="text-align:left;"> html </td> <td style="text-align:left;"> <a href="https://cdn.rawgit.com/stephenslab/gtexresults/58f85cb86cd374560990e7938fe5954944eb661f/docs/gtexdata.html" target="_blank">58f85cb</a> </td> <td style="text-align:left;"> Peter Carbonetto </td> <td style="text-align:left;"> 2018-06-21 </td> <td style="text-align:left;"> I have a complete first draft of the gtexdata page. </td> </tr> <tr> <td style="text-align:left;"> Rmd </td> <td style="text-align:left;"> <a href="https://github.com/stephenslab/gtexresults/blob/783da70c7526f89192fc3c58c0b0c4fa04351594/analysis/gtexdata.Rmd" target="_blank">783da70</a> </td> <td style="text-align:left;"> Peter Carbonetto </td> <td style="text-align:left;"> 2018-06-21 </td> <td style="text-align:left;"> wflow_publish(“gtexdata.Rmd”) </td> </tr> <tr> <td style="text-align:left;"> html </td> <td style="text-align:left;"> <a href="https://cdn.rawgit.com/stephenslab/gtexresults/fe8933344e2416b77065d0c560e7ea6f9503064c/docs/gtexdata.html" target="_blank">fe89333</a> </td> <td style="text-align:left;"> Peter Carbonetto </td> <td style="text-align:left;"> 2018-06-21 </td> <td style="text-align:left;"> Added info to gtexdata page. </td> </tr> <tr> <td style="text-align:left;"> Rmd </td> <td style="text-align:left;"> <a href="https://github.com/stephenslab/gtexresults/blob/f77307a55158a6f900c4ef4ed9762c8b4a5447f3/analysis/gtexdata.Rmd" target="_blank">f77307a</a> </td> <td style="text-align:left;"> Peter Carbonetto </td> <td style="text-align:left;"> 2018-06-21 </td> <td style="text-align:left;"> wflow_publish(“gtexdata.Rmd”) </td> </tr> <tr> <td style="text-align:left;"> Rmd </td> <td style="text-align:left;"> <a href="https://github.com/stephenslab/gtexresults/blob/87259b7e8e0614ac94f9b252c0d8c1c4adc1186c/analysis/gtexdata.Rmd" target="_blank">87259b7</a> </td> <td style="text-align:left;"> Peter Carbonetto </td> <td style="text-align:left;"> 2018-06-21 </td> <td style="text-align:left;"> wflow_publish(“gtexdata.Rmd”) </td> </tr> </tbody> </table> </ul> <p></details></p> <hr /> <div id="overview" class="section level2"> <h2>Overview</h2> <p>To apply multivariate adaptive shrinkage (<em>mash</em>) to data from the <a href="http://gtexportal.org">GTEx study</a>, we created an R data set (serialized R object) containing matrices of SNP-gene association statistics. These association statistics include effect estimates, <em>Z</em> scores and corresponding standard errors.</p> <p>See <a href="fastqtl2mash.html">here</a> for the scripts used to generate these statistics from the SNP-gene data that were provided by the <a href="http://gtexportal.org">GTEx Project</a>.</p> </div> <div id="how-to-download-the-data" class="section level2"> <h2>How to download the data</h2> <p>These are the recommended steps for retrieving the GTEx SNP-gene association statistics:</p> <ol style="list-style-type: decimal"> <li><p>Download or clone the <a href="https://github.com/stephenslab/gtexresults">git repository</a>.</p></li> <li><p>The association statistics are found in file <code>MatrixEQTLSumStats.Portable.Z.rds</code>.</p></li> </ol> </div> <div id="how-to-load-the-data-into-r" class="section level2"> <h2>How to load the data into R</h2> <p>Change the working directory in R (or RStudio) to the <code>analysis</code> directory of the <code>gtexresults</code> repository, e.g.,</p> <pre class="r"><code>setwd("gtexresults/analysis")</code></pre> <p>Next, read the data object into R:</p> <pre class="r"><code>dat <- readRDS("../data/MatrixEQTLSumStats.Portable.Z.rds")</code></pre> <p>Then get an overview of the data from this file:</p> <pre class="r"><code>names(dat) # [1] "strong.b" "strong.s" "strong.z" "random.b" # [5] "random.s" "random.z" "random.test.b" "random.test.s" # [9] "random.test.z" "vhat"</code></pre> </div> <div id="description-of-the-data" class="section level2"> <h2>Description of the data</h2> <p>This file contains SNP-gene association statistics for 16,069 genes and 44 human tissues. These 16,069 genes were selected because they all showed some indication of being expressed in all 44 tissues. Therefore, the association statistics are stored as matrices each with 16,069 rows and 44 columns, e.g.,</p> <pre class="r"><code>dim(dat$strong.b) # [1] 16069 44</code></pre> <p>As input to mash, we use a matrix of expression quantitative trait loci (eQTL) effect estimate, and corresponding standard errors. (We also provide <em>Z</em> scores.) See the manuscript for details on how these association statistics were obtained.</p> <p>These association statistics were subdivided into three subsets:</p> <ol style="list-style-type: decimal"> <li><p>Results from a subset “strong” tests. These tests were identified by taking the “top eQTL” in each gene based on univariate SNP-gene association tests. (Here, “top eQTL” for a given gene is defined as the SNP with the largest (univariate) <em>Z</em> statistic among all 44 tissues. The estimated effects, <em>Z</em> scores and standard errors for the strong tests are stored in three <span class="math inline">\(16,069 \times 44\)</span> matrices, <code>dat$strong.b</code>, <code>dat$strong.z</code> and <code>dat$strong.s</code>.</p></li> <li><p>Results from a random subset of 20,000 SNP-gene tests (this includes both “null” and “non”-null tests). The estimated effects, <em>Z</em> stores and standard errors for these random tests are stored in three <span class="math inline">\(20,000 \times 44\)</span> matrices, <code>dat$random.b</code>, <code>dat$random.z</code> and <code>dat$random.z</code>.</p></li> <li><p>Results from a second random subset of 28,198 SNP-gene tests. This is used for the cross-validation part of the mash analysis. The estimated effects, <em>Z</em> stores and standard errors for these random tests are stored in three <span class="math inline">\(28,198 \times 44\)</span> matrices, <code>dat$random.test.b</code>, <code>dat$random.test.z</code> and <code>dat$random.test.z</code>.</p></li> </ol> <p>Finally, the gene expression measurements in the GTEx study are correlated due to sample overlap (sometimes multiple measurements were obtained from the same individual). Therefore, we have also estimated a correlation matrix, which is stored in <code>dat$vhat</code>:</p> <pre class="r"><code>dim(dat$vhat) # [1] 44 44</code></pre> <p>See the manuscript for additional details how these data are used in the mash analysis.</p> </div> <div id="session-information" class="section level2"> <h2>Session information</h2> <pre class="r"><code>sessionInfo() # R version 3.4.3 (2017-11-30) # Platform: x86_64-apple-darwin15.6.0 (64-bit) # Running under: macOS High Sierra 10.13.5 # # Matrix products: default # BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib # LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib # # locale: # [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 # # attached base packages: # [1] stats graphics grDevices utils datasets methods base # # loaded via a namespace (and not attached): # [1] workflowr_1.0.1.9000 Rcpp_0.12.17 digest_0.6.15 # [4] rprojroot_1.3-2 R.methodsS3_1.7.1 backports_1.1.2 # [7] git2r_0.21.0 magrittr_1.5 evaluate_0.10.1 # [10] stringi_1.1.7 whisker_0.3-2 R.oo_1.21.0 # [13] R.utils_2.6.0 rmarkdown_1.9 tools_3.4.3 # [16] stringr_1.3.0 yaml_2.1.18 compiler_3.4.3 # [19] htmltools_0.3.6 knitr_1.20</code></pre> </div> <script type="text/x-mathjax-config"> MathJax.Hub.Config({ "HTML-CSS": { availableFonts: ["TeX"] } }); </script> <!-- Adjust MathJax settings so that all math formulae are shown using TeX fonts only; see http://docs.mathjax.org/en/latest/configuration.html. This will make the presentation more consistent at the cost of the webpage sometimes taking slightly longer to load. Note that this only works because the footer is added to webpages before the MathJax javascript. --> <script type="text/x-mathjax-config"> MathJax.Hub.Config({ "HTML-CSS": { availableFonts: ["TeX"] } }); </script> <hr> <p> This reproducible <a href="http://rmarkdown.rstudio.com">R Markdown</a> analysis was created with <a href="https://github.com/jdblischak/workflowr">workflowr</a> 1.0.1.9000 </p> <hr> </div> <script> // add bootstrap table styles to pandoc tables function bootstrapStylePandocTables() { $('tr.header').parent('thead').parent('table').addClass('table table-condensed'); } $(document).ready(function () { bootstrapStylePandocTables(); }); </script> <!-- dynamically load mathjax for compatibility with self-contained --> <script> (function () { var script = document.createElement("script"); script.type = "text/javascript"; script.src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"; document.getElementsByTagName("head")[0].appendChild(script); })(); </script> </body> </html>