Last updated: 2018-06-04
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Repository version: 340ad6f
wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .sos/
Ignored: data/.sos/
Ignored: output/MatrixEQTLSumStats.Portable.Z.coved.K3.P3.lite.single.expanded.V1.loglik.rds
Ignored: workflows/.ipynb_checkpoints/
Ignored: workflows/.sos/
Untracked files:
Untracked: fastqtl_to_mash_output/
Untracked: gtex6_workflow_output/
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 340ad6f | Peter Carbonetto | 2018-06-04 | wflow_publish(c(“index.Rmd”, “gtex.Rmd”, “fastqtl2mash.Rmd”)) |
Rmd | 7ff4f67 | Peter Carbonetto | 2018-06-01 | wflow_publish(“fastqtl2mash.Rmd”, view = FALSE) |
html | 0bbaec2 | Peter Carbonetto | 2018-06-01 | A few small revisions to fastqtl2mash demo. |
Rmd | cb5e65c | Peter Carbonetto | 2018-06-01 | wflow_publish(“fastqtl2mash.Rmd”, view = FALSE) |
html | 55397e1 | Peter Carbonetto | 2018-06-01 | Updates to fastqtl2mash demo. |
Rmd | 68995c4 | Peter Carbonetto | 2018-06-01 | wflow_publish(“fastqtl2mash.Rmd”, view = FALSE) |
Rmd | ab370ff | Peter Carbonetto | 2018-06-01 | Revising fastqtl2mash instructions. |
Rmd | f59f052 | Peter Carbonetto | 2018-06-01 | wflow_publish(“fastqtl2mash.Rmd”) |
Rmd | 155cfc9 | Peter Carbonetto | 2018-06-01 | wflow_publish(“fastqtl2mash.Rmd”) |
html | 930e0f6 | Peter Carbonetto | 2018-06-01 | Build site. |
Rmd | 401fd65 | Peter Carbonetto | 2018-06-01 | wflow_publish(“fastqtl2mash.Rmd”) |
Rmd | 6a456e6 | Peter Carbonetto | 2018-06-01 | Moved some output files to data folder; removed some old files from |
We provide code to convert association statistics in FastQTL format, or a format similar to FastQTL, to a format that is more suited for analysis with mash. This code was used to generate MatrixEQTLSumStats.Portable.Z.rds
in the git repository from the SNP-gene association statistics included as part of Release 6 of the GTEx Project (the source file was named GTEx_Analysis_V6_all-snp-gene-associations.tar
).
Here we give instructions for using this code, and demonstrate how to convert a toy FastQTL data set. This toy data set is included in the git repository.
To facilitate running our conversion procedure, we have also developed a Docker container that includes all the required software components, notably the HDF5 libraries used to create intermediate data files that can be efficiently queried. Docker can run on most popular operating systems (Mac, Windows and Linux) and cloud computing services such as Amazon Web Services and Microsoft Azure. If you have not used Docker before, you might want to read this to learn the basic concepts and understand the main benefits of Docker.
For details on how the Docker image was configured, see hdf5tools.dockerfile
in the workflows
directory of the git repository. The Docker image used for our analyses is based on gaow/lab-base, a customized Docker image for development with R and Python.
If you find a bug in any of these steps, please post an issue.
Download Docker (note that a free community edition of Docker is available), and install it following the instructions provided on the Docker website. Once you have installed Docker, check that Docker is working correctly by following Part 1 of the “Getting Started” guide. If you are new to Docker, we recommend reading the entire “Getting Started” guide.
Note: Setting up Docker requires that you have administrator access to your computer. Singularity is an alternative that accepts Docker images and does not require administrator access.
Run this alias
command in the shell, which will be used below to run commands inside the Docker container:
alias fastqtl2mash-docker='docker run --security-opt label:disable -t '\
'-P -h MASH -w $PWD -v $HOME:/home/$USER -v /tmp:/tmp -v $PWD:$PWD '\
'-u $UID:${GROUPS[0]} -e HOME=/home/$USER -e USER=$USER gaow/hdf5tools'
The -v
flags in this command map directories between the standard computing environment and the Docker container. Since the analyses below will write files to these directories, it is important to ensure that:
Environment variables $HOME
and $PWD
are set to valid and writeable directories (usually your home and current working directories, respectively).
/tmp
should also be a valid and writeable directory.
If any of these statements are not true, please adjust the alias
accordingly. The remaining options only affect operation of the container, and so should function the same regardless of your operating system.
Next, run a simple command in the Docker container to check that has loaded successfully:
fastqtl2mash-docker uname -sn
This command will download the Docker image if it has not already been downloaded.
If the container was successfully run, you should see this information about the Docker container outputted to the screen:
Linux MASH
You can also run these commands to show the information about the image downloaded to your computer and the container that has run (and exited):
docker image list
docker container list --all
Note: If you get error “Cannot connect to the Docker daemon. Is the docker daemon running on this host?” in Linux or macOS, see here for Linux or here for Mac for suggestions on how to resolve this issue.
Clone or download the gtexresults repository to your computer, then change your working directory in the shell to the root of the repository, e.g.,
cd gtexresults
All the commands below will be run from this directory.
Next, use the fastqtl_to_mash.ipynb
code in the workflows
directory to convert the toy data set in FastQTL format to the mash format. The toy data are stored in the data/fastqtl
subdirectory of the git repository.
Having followed the above steps to set up the Docker container on your computer, the data conversion can be carried out with the following command:
fastqtl2mash-docker sos run workflows/fastqtl_to_mash.ipynb \
--data_list data/fastqtl/FastQTLSumStats.list \
--gene_list data/fastqtl/GTEx_genes.txt
If successful, this command will write several files to a newly created directory, fastqtl_to_mash_output
. One file, FastQTLSumStats.mash.rds
, contains the eQTL summary statistics in an RDS file, which is easily loaded into R; see help(readRDS)
in R for detailsf. For more information about the contents of this file, and how they can be provided as input to the mash methods using the set_mash_data
function, see the documentation inside the fastqtl2mash notebook and the vignettes in the mashr package.
All containers that have run and exited will still be retained in the Docker system. Run docker container list --all
to list all previous run containers. To clear these previously run containers, run docker container prune
. See here for more information.
The conversion procedure has several options which were not illustrated in the example above. View the fastqtl_to_mash.ipynb
file in Jupyter, or in your Web browser here, for more details about the available options, specifications of the input files, and other usage information.
Converting the full GTEx data set is computationally intensive and is best done in high-performance computing environment with configurations to run the workflow across different compute nodes. See here for details.
This reproducible R Markdown analysis was created with workflowr 1.0.1.9000