I imagine most users of GEMINI have built fairly complicated sets of queries to handle their needs. GEMINI provides a large set of built in tools as well as a very powerful SQL-based custom query builder.

For each of our exomes and WGS, we use many of the built in GEMINI tools as well as a custom query to identify deleterious variants in the ACMG 59 gene list.

In the Quick Start vignette we see how SeeGEM can be used to quickly and easily create the SeeGEM document for a single query.

In this vignette we will see how to bundle together a complicated set of queries and a peddy quality control output into a SeeGEM document.

Most importantly, this approach is fully scriptable and can be easily fit into a pipeline.

A working script that makes a large set of GEMINI queries and creates SeeGEM output is included in this package. It can be copied to your home directory as follows:

system(paste('cp', 
             system.file("example_scripts/GEMINI_db_to_SeeGEM.R", package="SeeGEM"), 
             '~/GEMINI_db_to_SeeGEM.R'))

It takes four inputs:

  1. The path to the GEMINI database
  2. The name of the family (from the ped file) to be analyzed
  3. The path and name of your output SeeGEM html file
  4. The path and prefix of your peddy output

You could run it as follows on the bash command line:

Rscript ~/GEMINI_db_to_SeeGEM.R Family001 /path/to/your/GEMINI.db ~/See_GEM_Family001.html /path/to/peddy_data/YOUR_PEDDY_PREFIX

If you open up the script in Rstudio (or whatever IDE you prefer), you will see that the script creates a list object then runs a series of GEMINI queries with the and and stores the output in the list object.

These are fairly lightweight wrappers and they only do three things to the GEMINI data:

  1. Import the data with the data.table fread function
  2. Coerce the chrom column to character to avoid possible 1,2,…X,Y issues
  3. Add a new column test which labels what kind of GEMINI query/test was run

At the end, the list data is turned into a single data frame with data.table function rbindlist.

If you have your own functioning python/bash scripts creating GEMINI queries already, you probably can simply import the data into R as follows:

library(SeeGEM)
your_gemini_data <- fread('path_to_your_gemini_output.tsv')
your_gemini_data$test <- 'WHAT TEST YOU RAN'
your_gemini_data$chrom <- as.character(your_gemini_data$chrom) # coerce to character

Then you can use your_gemini_data in the function as the GEMINI_data like so:

knit_see_gem(GEMINI_data = your_gemini_data, sample_name = 'Proband 001')