MLGWAS: Machine Learning for Genome-Wide Association Studies

Olivier M. F. Martin

2016-12-10

Table of Contents

  1. Introduction
  2. Installation and Citation
  3. Importing Clinical and Genomics Data into MDT Objects
  4. Annotating MDT Objects
  5. Manipulating and Plotting MDT Objects
  6. MLGWAS Objects - Training Classifiers
  7. Session Info

1. Introduction

Machine learning for Genome-Wide Association Studies (GWAS) is still in its infancy and no software suite we know of currently exists. MachineLearningGWAS is a package for R meant to meet this need. It is freely available at . This package depends on the following R packages: caret for machine learning, ggplot2 for plotting and data.table for data manipulation. It also depends on various Bioconductor packages. Briefly, it provides an interface to:

  1. Import VCF file and clinical data into an R S4 object named MDT and based on the data.table package;

  2. Annotate variants using Bioconductor annotations packages;

  3. Estimate different statistics for variants such as p-values for Hardy-Weinberg equilibrium tests or association tests as well as Minor Allele Frequency (MAF).

  4. Filter out variants or samples;

  5. Use this data to train classifiers using caret package, and organize results (classifiers, predictions, errors, feature importance) in one R S4 object named MLGWAS and based on the data.table package;

  6. Plot and analyze genomics and classification data using the ggplot2 package.

2. Installation and Citation

To install the MachineLearningGWAS, install R and RStudio, and then run the following code in the R console:

# install.packages("devtools")
devtools::install_github("olivmrtn/MachineLearningGWAS")

To load package to namespace, run the following code:

library(MachineLearningGWAS)
citation("MachineLearningGWAS")

To cite MachineLearningGWAS in publications use:

  Olivier M. F. Martin (2016). MachineLearningGWAS: Machine
  Learning for Genome-Wide Association Studies URL
  https://www.github.com/olivmrtn/MachineLearningGWAS/

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {{MachineLearningGWAS}: Machine Learning for Genome-Wide Association Studies},
    author = {Olivier M. F. Martin},
    year = {2016},
    date = {1996},
    url = {https://www.github.com/olivmrtn/MachineLearningGWAS/},
  }

3. Creating MDT Objects - Importing Clinical and Genomics Data

Throughout this tutorial, we will use simulated data. Genomics data is the VCF file snps.vcf file and clinical data is tab-delimited file clinical.txt. These can be retrieved using the following commands.

genotype_file <- system.file("extdata", "snps.vcf", 
                             package = "MachineLearningGWAS")
clinical_file <- system.file("extdata", "clinical.txt", 
                             package = "MachineLearningGWAS")

Briefly, the genomics data is composed of 200 samples and 1000 SNPs, out of which only 5 of them are truly associated with the response. These are indicated in the AS field with a 1. Genotypes are in the GT field. The data is also comprised of dbSNP rs# identifier, and quality QUAL and sequencing depth DP.

Clinical data specifies the disease status of the patient (RESPONSE), as well as age (AGE), sex (SEX) and patient identifier (SampleID).

Currently, it is only possible to import genomics data as a list of VCF objects of the VariantAnnotation package. This can be done using the readVcf function. Alternatively, if your data is divided into more than one VCF files, you can use the readVcfDir function in this package.

vcf <- VariantAnnotation::readVcf(genotype_file)
mdt <- vcfsToMDT(list(vcf))

Clinical data can be imported using the importPhenotype function. The response_type argument specifies if the RESPONSE should be treated as a categorical or continuous variable and sep specifies the file separator.

mdt <- importPhenotype(mdt, clinical_file, response_type = "factor", sep = "\t")

Printing the objects shows its contents.

mdt
--- MDT object ---
mtable(x):      200 individuals and 1000 features
annotations(x): 1000 rows in 5 columns
info(x):        200000 rows in 7 columns
response(x):    factor 0: 101 | 1: 99 
phenotype(x):   200 rows in 4 columns

An MDT object is composed of four main data.table: mtable, annotations, info and phenotype. These are interlinked by two identifiers (i.e. keys): SampleID and FeatureID.

SampleIDs and FeatureIDs can be retrieve using samples and features.

head(MachineLearningGWAS::samples(mdt))
[1] "Sample1"   "Sample10"  "Sample100" "Sample101" "Sample102" "Sample103"
head(MachineLearningGWAS::features(mdt))
[1] "1"    "10"   "100"  "1000" "101"  "102" 

The mtable data.table contains genotypes coded as integers 0, 1 and 2 with the following meanings: 0: homozygote for the reference allele; 1: heterozygote; 2: homozygote for the alternative allele. Superior values correspond to other types of genotypes.

head(mtable(mdt))
   FeatureID  SampleID VALUE
1:         1   Sample1     1
2:         1  Sample10     0
3:         1 Sample100     0
4:         1 Sample101     0
5:         1 Sample102     0
6:         1 Sample103     1

The annotations data.table contains annotations about features. This can be expanded using the annotate* functions as we will in the next section.

head(annotations(mdt))
   FeatureID CHROM      POS STRAND REF ALT
1:         1    19   295180      *   C   T
2:        10    19  1615315      *   C   T
3:       100    19 18059681      *   A   G
4:      1000    22 51181851      *   C   T
5:       101    19 18094546      *   A   G
6:       102    19 18222131      *   C   T

The info data.table contains annotations that are dependent on features and samples such as quality measures (QUAL) or genotypes (GT).

head(MachineLearningGWAS::info(mdt))
   FeatureID  SampleID paramRangeID     QUAL FILTER INFO_NS INFO_DP
1:         1   Sample1           NA 46.01215      .     200     274
2:         1  Sample10           NA 46.01215      .     200     274
3:         1 Sample100           NA 46.01215      .     200     274
4:         1 Sample101           NA 46.01215      .     200     274
5:         1 Sample102           NA 46.01215      .     200     274
6:         1 Sample103           NA 46.01215      .     200     274
   INFO_AS  GT
1:       0 0/1
2:       0 0/0
3:       0 0/0
4:       0 0/0
5:       0 0/0
6:       0 0/1

The phenotype data.table contains clinical data about samples as imported by importPhenotype.

head(phenotype(mdt))
    SampleID RESPONSE AGE SEX
1:   Sample1        1  69   1
2:  Sample10        0  66   1
3: Sample100        1  69   0
4: Sample101        1  61   0
5: Sample102        0  62   0
6: Sample103        1  80   1

4. Annotating MDT Objects

All annotations functions can be found by typing help(annotateMDT) in the R console. These functions append columns to the annotations data.table.

For example, we can append an univariate independence test and Hardy-Weinberg (HW) equilibrium p-values, as well as Minor Allele Frequency (MAF).

## Annotate Fisher's test and HW equilibrium p-values
mdt <- annotateStat(mdt, function_list = list(FISHER = function(x, y) {    
  stats::fisher.test(table(factor(x, c(0L, 1L, 2L)), 
                           factor(y, c(0L, 1L))))$p.value },
  HW = function(x, y) {    
    HardyWeinberg::HWExact(table(factor(x, c(0L, 1L, 2L))), 
                           verbose = FALSE)$pval
  }), 
  fill = NA, verbose = TRUE)

# Annotate MAF
mdt <- annotateMAF(mdt)

Entrez identifiers can be added using the annotateLocation function. This function is dependant on a TxDb object from Bioconductor. Here, we will be using the TxDb.Hsapiens.UCSC.hg19.knownGene package.

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
mdt <- annotateLocation(mdt, TxDb.Hsapiens.UCSC.hg19.knownGene)

These Entrez identifiers can be used to query different Bioconductor annotation databases such as org.Hs.eg.db using annotateBioconductor. Here under we append gene names and associated Gene Ontology (GO) terms.

library(org.Hs.eg.db)
mdt <- annotateBioconductor(mdt, 
                            annotations_db = org.Hs.eg.db, 
                            keys_colname = "ENTREZID", 
                            columns = c("GENENAME", "GO"))

Finally, annotateCoding allows annotating consequence of mutations and annotateRSID to annotate dbSNP rs identifiers.

library(BSgenome.Hsapiens.UCSC.hg19)
library(SNPlocs.Hsapiens.dbSNP144.GRCh37)
mdt <- annotateCoding(mdt, 
                      BSgenome.Hsapiens.UCSC.hg19,
                      TxDb.Hsapiens.UCSC.hg19.knownGene)

mdt <- annotateRSID(mdt, SNPlocs.Hsapiens.dbSNP144.GRCh37)

Let’s see results.

head(annotations(mdt))
   FeatureID ENTREZID CHROM     POS STRAND REF ALT    FISHER         HW
1:         1       NA    19  295180      *   C   T 0.2541678 0.05783275
2:        10     6929    19 1615315      *   C   T 0.6448435 0.30891569
3:        10     6929    19 1615315      *   C   T 0.6448435 0.30891569
4:        10     6929    19 1615315      *   C   T 0.6448435 0.30891569
5:        10     6929    19 1615315      *   C   T 0.6448435 0.30891569
6:        10     6929    19 1615315      *   C   T 0.6448435 0.30891569
      MAF   LOCATION               GENENAME         GO EVIDENCE ONTOLOGY
1: 0.0900 intergenic                     NA         NA       NA       NA
2: 0.2975     intron transcription factor 3 GO:0000122      IDA       BP
3: 0.2975     intron transcription factor 3 GO:0000790      IDA       CC
4: 0.2975     intron transcription factor 3 GO:0000978      IDA       MF
5: 0.2975     intron transcription factor 3 GO:0001078      IDA       MF
6: 0.2975     intron transcription factor 3 GO:0002326      NAS       BP
   CONSEQUENCE_ALT CONSEQUENCE        RSID
1:              NA          NA rs547079193
2:               T  synonymous rs147690653
3:               T  synonymous rs147690653
4:               T  synonymous rs147690653
5:               T  synonymous rs147690653
6:               T  synonymous rs147690653

5. Manipulating and Plotting MDT Objects

Manipulating

We will present five functions allowing us to manipulate MDT objects: selectMDT, aggregateMDT,[,filterMDT, andfillMDT`.

aggregateMDT aggregate FeatureIDs into Higher Level IDs, for example, Entrez identifiers or GO terms. It requires a data.frame specifying how to aggregate FeatureIDs. The first column is the old FeatureID, the second is the new FeatureID. These can be easily retrieved using selectMDT.

df <- selectMDT(mdt, MachineLearningGWAS::features(mdt), "ENTREZID")
head(df)
   FeatureID ENTREZID
1:        10     6929
2:      1000       49
3:       101     3780
4:       102    23031
5:       103    23031
6:       104    23031

This data.frame can be used to aggregate the MDT objects. As you can see, the FeatureID corresponds to Entrez identifiers.

ag_mdt <- aggregateMDT(mdt, df, fun.aggregate = mean)
ag_mdt
--- MDT object ---
mtable(x):      200 individuals and 481 features
annotations(x): 481 rows in 0 columns
info(x):        96200 rows in 0 columns
response(x):    factor 0: 101 | 1: 99 
phenotype(x):   200 rows in 4 columns
head(mtable(ag_mdt))
   FeatureID  SampleID VALUE
1: 100128496   Sample1     1
2: 100128496  Sample10     1
3: 100128496 Sample100     0
4: 100128496 Sample101     0
5: 100128496 Sample102     1
6: 100128496 Sample103     0

The [ and filterMDT are subsetting functions. [ subsets according to SampleIDs and FeatureIDs whereas filterMDT according to some condition and data in one of the data.tables.

For example, here we keep only FeatureIDs 1 and 2, and SampleIDs Sample1 and Sample2 These MUST be characters. Don’t forget about samples and features.

mdt[c("Sample1", "Sample2"), c("1", "2")]
--- MDT object ---
mtable(x):      2 individuals and 2 features
annotations(x): 5 rows in 17 columns
info(x):        4 rows in 7 columns
response(x):    factor 0: 1 | 1: 1 
phenotype(x):   2 rows in 4 columns

On another hand, we can use filterMDT to keep only non-synonymous variants and remove those with low quality.

mdt <- filterMDT(mdt, QUAL > 50, "info")
mdt
--- MDT object ---
mtable(x):      200 individuals and 945 features
annotations(x): 8671 rows in 17 columns
info(x):        189000 rows in 7 columns
response(x):    factor 0: 101 | 1: 99 
phenotype(x):   200 rows in 4 columns

Finally, if there are any missing entries, fillMDT can be used. For example, we can assume that missing values were homozygotes for the REF allele.

fillMDT(mdt, fill = 0L)
--- MDT object ---
mtable(x):      200 individuals and 945 features
annotations(x): 8671 rows in 17 columns
info(x):        189000 rows in 7 columns
response(x):    factor 0: 101 | 1: 99 
phenotype(x):   200 rows in 4 columns

Plotting

Plotting for MDT is possible using plotMDT. Available plots are scatterplots and heatmaps. Preprocessing such as scaling, centering, and PCA and be applied beforehand.

Here under is a scatter plot of the first two principal components in sample space.

plotMDT(mdt, 
        type = "scatter", 
        preprocess = c("scale", "center", "pca"), 
        space = c("samples"), 
        dims = c(1L, 2L))

Here under is a heatmap plot of associated SNPs. As you can see, there is a good separation between cases (blue) and controls (red).

try(plotMDT(filterMDT(mdt, INFO_AS == 1, "info"),
        type = "heatmap", 
        preprocess = c("scale", "center")))

Finally, it is possible to create Manhattan plots using any numeric value in annotations. Here, we will illustrate this using results from Fisher’s exact test we have computed using annotateStat

manhattanPlot(mdt, "FISHER")

6. MLGWAS Objects - Training Classifiers

MDT objects can be used to train classifiers; results are contained in MLGWAS objects. This is possible using trainClassifier.

This function can be thought of as a wrapper for the train function in the caret package. By default, the function is composed of two main loops corresponding to double fold cross-validation. The outer fold is specified by the partitions argument, a list of indices which define the training set. The training set is used for preprocessing (a function specified by the preprocess argument) and as input for train. The test set is all other indices; it is neither used preprocessing, training and will be used to estimate performance measures. It is, however, possible to set the test set as the whole dataset by setting validation to FALSE. The inner fold is handled by train and can be set using its trControl and tuneGrid argument. Actually, any argument can be passed on to train.

Here under we train two classifiers using random forest (method = 'rf'). To make this run faster, we will only retain the associated SNPs and ten others. We will also use variables contained in phenotype (except RESPONSE).

allfeat <- MachineLearningGWAS::features(mdt)
associated <- unique(MachineLearningGWAS::info(mdt)[INFO_AS == 1, FeatureID])
random <- sample(allfeat[! allfeat %in% associated], 15)
selected <- c(associated, random)
mdt <- mdt[, selected]

clas <- trainClassifier(mdt, id = "clas", phen_vars = "all", method = 'rf')

An MLGWAS object is composed of three main data.table: performances, featimp, errors. These are interlinked by fours identifiers (i.e. keys). SampleID and FeatureID are extracted from the MDT objects. ClassifierID is determined by the id argument when calling trainClassifier. Finally, PartitionID is determined by the names of the partition argument. Printing the MLGWAS object reveals its identifiers. It should be noted that each of these data.table as an associated summary and plot function.

clas
--- MLGWAS object ---
1 classifiers: clas
10 partitions: Fold01.Rep1, Fold02.Rep1, Fold03.Rep1, Fold04.Rep1, Fold05.Rep1, ...
22 features: 105, 114, 135, 18, 30, ...
200 samples: Sample1, Sample10, Sample100, Sample101, Sample102, ...

performances contains estimates about classifier performance across different folds (partitions). It can be summarized using summaryPerf

performances(clas)
    ClassifierID PartitionID  Accuracy    ROCAUC       PPV       NPV
 1:         clas Fold01.Rep1 0.8947368 0.8888889 0.8333333 1.0000000
 2:         clas Fold02.Rep1 0.8500000 0.8500000 0.8181818 0.8888889
 3:         clas Fold03.Rep1 1.0000000 1.0000000 1.0000000 1.0000000
 4:         clas Fold04.Rep1 0.9523810 0.9545455 1.0000000 0.9090909
 5:         clas Fold05.Rep1 1.0000000 1.0000000 1.0000000 1.0000000
 6:         clas Fold06.Rep1 0.9500000 0.9500000 0.9090909 1.0000000
 7:         clas Fold07.Rep1 0.9500000 0.9500000 1.0000000 0.9090909
 8:         clas Fold08.Rep1 0.9500000 0.9500000 0.9090909 1.0000000
 9:         clas Fold09.Rep1 0.9000000 0.9000000 0.9000000 0.9000000
10:         clas Fold10.Rep1 0.9000000 0.9000000 1.0000000 0.8333333
    Sensitivity Specificity   F1Score
 1:   1.0000000   0.7777778 0.9090909
 2:   0.9000000   0.8000000 0.8571429
 3:   1.0000000   1.0000000 1.0000000
 4:   0.9090909   1.0000000 0.9523810
 5:   1.0000000   1.0000000 1.0000000
 6:   1.0000000   0.9000000 0.9523810
 7:   0.9000000   1.0000000 0.9473684
 8:   1.0000000   0.9000000 0.9523810
 9:   0.9000000   0.9000000 0.9000000
10:   0.8000000   1.0000000 0.8888889
summaryPerf(clas)
   ClassifierID Summary    Accuracy      ROCAUC         PPV         NPV
1:         clas    mean  0.93471178  0.93434343  0.93696970  0.94404040
2:         clas      sd  0.04798424  0.04865291  0.07275252  0.06266101
3:         clas     len 10.00000000 10.00000000 10.00000000 10.00000000
   Sensitivity Specificity    F1Score
1:  0.94090909  0.92777778  0.9359634
2:  0.06939989  0.08642416  0.0466479
3: 10.00000000 10.00000000 10.0000000

featimp contains feature importance across folds. These values are obtained calling varImp for caret. It can be summarized using summaryFeatimp

head(featimp(clas))
   ClassifierID PartitionID FeatureID  Importance
1:         clas Fold01.Rep1       105 6.581981806
2:         clas Fold01.Rep1       114 0.008179545
3:         clas Fold01.Rep1       135 1.949714639
4:         clas Fold01.Rep1        18 9.055058882
5:         clas Fold01.Rep1        30 4.795876359
6:         clas Fold01.Rep1       350 7.684334815
head(summaryFeatimp(clas))
   ClassifierID FeatureID      mean        sd len
1:         clas       105 5.3855108 0.7992505  10
2:         clas       114 0.0679622 0.1305055  10
3:         clas       135 1.6748299 0.7568015  10
4:         clas        18 7.9673276 0.7320299  10
5:         clas        30 5.5747477 0.4263509  10
6:         clas       350 7.6375052 0.4464784  10

Finally, errors represent local residuals. It can help identify samples that are too predict. For categorical responses, 0 represents a success and 1 an error. It can be summarized using summaryErrors

head(na.omit(errors(clas)))
   ClassifierID  SampleID PartitionID Error
1:         clas   Sample1 Fold02.Rep1     0
2:         clas  Sample10 Fold06.Rep1     0
3:         clas Sample100 Fold03.Rep1     0
4:         clas Sample101 Fold01.Rep1     0
5:         clas Sample102 Fold01.Rep1     0
6:         clas Sample103 Fold10.Rep1     0
head(summaryErrors(clas))
   ClassifierID  SampleID mean sd len
1:         clas   Sample1    0 NA   1
2:         clas  Sample10    0 NA   1
3:         clas Sample100    0 NA   1
4:         clas Sample101    0 NA   1
5:         clas Sample102    0 NA   1
6:         clas Sample103    0 NA   1

One may with to compare performance results with a random setting. This can be done using permutations. We then bind the two objects together using bindML.

perm <- trainClassifier(mdt, id = "perm",
                        method = 'rf', phen_vars = "all", 
                        permute = TRUE)
fits <- bindML(clas, perm)

There exists different three main functions to plot MLGWAS objects: plotPerf, plotFeatimp and plotErrors.

plotPerf plots performance measures. Here under we plot accuracy values. As expected, the classifier using the unpermuted data (clas) outperforms random predictions (perm).

plotPerf(fits, "Accuracy")

plotFeatimp plots feature importance. Associated features are “555”, “615”, “666”, “770”, “917” and “AGE”. It so happens, these are the features being selected by our classifier. The permuted classifier, on the contrary, seams to be selecting features at random.

plotFeatimp(fits)

Finally, it possible to look at residual errors (i.e. per sample error) using plotErrors.

It is also possible to create different types of models by playing around with arguments. One should also not that any argument in train can be passed on. For example, we can do 10 bootstraps instead of double fold cross validation with the following code.

ml <- trainClassifier(mdt, id = "boot", 
                      partition = caret::createResample(response(mdt), 
                                                        times = 10), 
                      trControl = trainControl("none"), 
                      tuneGrid = data.frame(.mtry = 5L),
                      validation = FALSE)

It also possible to create a preprocessing function that only selects the 5 most associated values as determined by Fisher’s exact test as follows. This can then be used as input to the preprocess argument.

fisherSelector <- function(x, y) {
  pv <- apply(x, 2, function(col) {
    fisher.test(table(factor(col, 0:2), y))$p.value
  })
  pv <- sort(pv)
  selected <- names(pv)[1:5]
  function(x) x[, selected]
}

Finally, the trainClassifiers function allow to call trainClassifier iteratively using a data.frame of parameters. For example, instead of calling two times trainClassifer to get a real and a permuted classifier, one can run this code. Note: functions must be defined within a vector.

params <- expand.grid(method = "rf", 
                      permuted = c(TRUE, FALSE), 
                      preprocess = c(fisherSelector))
params$id <- c("perm", "real")
fits2 <- trainClassifiers(mdt, params)

7. Session Info

R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.6 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] caret_6.0-73                            
 [2] ggplot2_2.2.0                           
 [3] lattice_0.20-34                         
 [4] randomForest_4.6-12                     
 [5] SNPlocs.Hsapiens.dbSNP144.GRCh37_0.99.11
 [6] BSgenome.Hsapiens.UCSC.hg19_1.4.0       
 [7] BSgenome_1.42.0                         
 [8] rtracklayer_1.34.1                      
 [9] org.Hs.eg.db_3.4.0                      
[10] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 
[11] GenomicFeatures_1.26.0                  
[12] AnnotationDbi_1.36.0                    
[13] VariantAnnotation_1.20.1                
[14] Rsamtools_1.26.1                        
[15] Biostrings_2.42.0                       
[16] XVector_0.14.0                          
[17] SummarizedExperiment_1.4.0              
[18] Biobase_2.34.0                          
[19] GenomicRanges_1.26.1                    
[20] GenomeInfoDb_1.10.1                     
[21] IRanges_2.8.1                           
[22] S4Vectors_0.12.0                        
[23] BiocGenerics_0.20.0                     
[24] MachineLearningGWAS_0.1.0               
[25] data.table_1.9.6                        

loaded via a namespace (and not attached):
 [1] nlme_3.1-128             bitops_1.0-6            
 [3] pbkrtest_0.4-6           devtools_1.12.0         
 [5] rprojroot_1.1            tools_3.3.1             
 [7] backports_1.0.4          KernSmooth_2.23-15      
 [9] rpart_4.1-10             DBI_0.5-1               
[11] lazyeval_0.2.0           mgcv_1.8-16             
[13] colorspace_1.3-1         nnet_7.3-12             
[15] withr_1.0.2              gridExtra_2.2.1         
[17] compiler_3.3.1           chron_2.3-47            
[19] quantreg_5.29            HardyWeinberg_1.5.6     
[21] mice_2.25                SparseM_1.74            
[23] labeling_0.3             caTools_1.17.1          
[25] scales_0.4.1             pbapply_1.3-1           
[27] stringr_1.1.0            digest_0.6.10           
[29] minqa_1.2.4              rmarkdown_1.2           
[31] DOSE_3.0.9               htmltools_0.3.5         
[33] lme4_1.1-12              RSQLite_1.0.0           
[35] BiocParallel_1.8.1       gtools_3.5.0            
[37] GOSemSim_2.0.1           ModelMetrics_1.1.0      
[39] car_2.1-3                RCurl_1.95-4.8          
[41] magrittr_1.5             GO.db_3.4.0             
[43] Matrix_1.2-7.1           Rcpp_0.12.8             
[45] munsell_0.4.3            pROC_1.8                
[47] stringi_1.1.2            yaml_2.1.14             
[49] MASS_7.3-45              zlibbioc_1.20.0         
[51] gplots_3.0.1             plyr_1.8.4              
[53] qvalue_2.6.0             grid_3.3.1              
[55] gdata_2.17.0             DO.db_2.9               
[57] splines_3.3.1            knitr_1.15.1            
[59] fgsea_1.0.2              igraph_1.0.1            
[61] reshape2_1.4.2           codetools_0.2-14        
[63] biomaRt_2.30.0           fastmatch_1.0-4         
[65] XML_3.98-1.5             evaluate_0.10           
[67] nloptr_1.0.4             foreach_1.4.3           
[69] MatrixModels_0.4-1       gtable_0.2.0            
[71] assertthat_0.1           e1071_1.6-7             
[73] class_7.3-14             survival_2.40-1         
[75] tibble_1.2               iterators_1.0.8         
[77] GenomicAlignments_1.10.0 memoise_1.0.0