Last updated: 2018-06-01

workflowr checks: (Click a bullet for more information)
Expand here to see past versions:


Overview

We provide code to convert association statistics in FastQTL format, or a format similar to FastQTL, to a format that is more suited for analysis with mash. This code was used to generate MatrixEQTLSumStats.Portable.Z.rds in the git repository from the SNP-gene association statistics included as part of Release 6 of the GTEx Project (the source file was named GTEx_Analysis_V6_all-snp-gene-associations.tar).

Here we give instructions for using this code, and demonstrate how to convert a toy FastQTL data set. This toy data set is included in the git repository.

To facilitate running our conversion procedure, we have also developed a Docker container that includes all the required software components, notably the HDF5 libraries used to create intermediate data files that can be efficiently queried. Docker can run on most popular operating systems (Mac, Windows and Linux) and cloud computing services such as Amazon Web Services and Microsoft Azure. If you have not used Docker before, you might want to read this to learn the basic concepts and understand the main benefits of Docker.

For details on how the Docker image was configured, see hdf5tools.dockerfile in the workflows directory of the git repository. The Docker image used for our analyses is based on gaow/lab-base, a customized Docker image for development with R and Python.

If you find a bug in any of these steps, please post an issue.

Download and install Docker

Download Docker (note that a free community edition of Docker is available), and install it following the instructions provided on the Docker website. Once you have installed Docker, check that Docker is working correctly by following Part 1 of the “Getting Started” guide. If you are new to Docker, we recommend reading the entire “Getting Started” guide.

Note: Setting up Docker requires that you have administrator access to your computer. Singularity is an alternative that accepts Docker images and does not require administrator access.

Download and test Docker image

Run this alias command in the shell, which will be used below to run commands inside the Docker container:

alias fastqtl2mash-docker='docker run --security-opt label:disable -t '\
'-P -h MASH -w $PWD -v $HOME:/home/$USER -v /tmp:/tmp -v $PWD:$PWD '\
'-u $UID:${GROUPS[0]} -e HOME=/home/$USER -e USER=$USER gaow/hdf5tools'

The -v flags in this command map directories between the standard computing environment and the Docker container. Since the analyses below will write files to these directories, it is important to ensure that:

If any of these statements are not true, please adjust the alias accordingly. The remaining options only affect operation of the container, and so should function the same regardless of your operating system.

Next, run a simple command in the Docker container to check that has loaded successfully:

fastqtl2mash-docker uname -sn

This command will download the Docker image if it has not already been downloaded.

If the container was successfully run, you should see this information about the Docker container outputted to the screen:

Linux MASH

You can also run these commands to show the information about the image downloaded to your computer and the container that has run (and exited):

docker image list
docker container list --all

Note: If you get error “Cannot connect to the Docker daemon. Is the docker daemon running on this host?” in Linux or macOS, see here for Linux or here for Mac for suggestions on how to resolve this issue.

Clone or download the gtexresults repository

Clone or download the gtexresults repository to your computer, then change your working directory in the shell to the root of the repository, e.g.,

cd gtexresults

All the commands below will be run from this directory.

Convert eQTL summary statistics

Next, use the fastqtl_to_mash.ipynb code in the workflows directory to convert the toy data set in FastQTL format to the mash format. The toy data are stored in the data/fastqtl subdirectory of the git repository.

Having followed the above steps to set up the Docker container on your computer, the data conversion can be carried out with the following command:

fastqtl2mash-docker sos run workflows/fastqtl_to_mash.ipynb \
  --data_list data/fastqtl/FastQTLSumStats.list \
  --gene_list data/fastqtl/GTEx_genes.txt

If successful, this command will write several files to a newly created directory, fastqtl_to_mash_output. The file FastQTLSumStats.mash.rds contains the eQTL summary statistics in an RDS file, which is easily loaded into R; see help(readRDS) in R for detailsf. For more information about the contents of this file, and how they can be provided as input to the mash methods using the set_mash_data function, see the documentation inside the fastqtl2mash notebook and the vignettes in the mashr package.

Additional usage notes


This reproducible R Markdown analysis was created with workflowr 1.0.1.9000