pipeLIMMA Run analyses of differnetial expression in LIMMA.makeBinarySig Find and count the number of significantly differentially expressed genespqHists Plot the distribution of P and FDR-corrected P (Q) valuesvoom2PCA Conduct principal component analyses on voom (or otherwise) normalized expression matricesvolcanoPlot Generate log2 foldchange vs. P-value volcano plotsvolcanoPair Compare two sets of log2 fold changes, colored by p-valuespipeDESeq2 Runs an analysis of differential expression similar to that of pipeLIMMA, except through the DESeq2 package.The limmaDE2 packages is not on CRAN and can only be installed from github. Make sure the devtools package is installed on your system. The limmaDE2 package also requires several other R packages to be installed. These packages have functions upon which limmaDE2 depends:
limma contains all of the basic statistical elements that are called here.qvalue fdr-corrected p-valuesedgeR Differential expression package used for some transformationsSimSeq permits simulation of RNA-seq dataplyr summarize and manipulate lists and dataframesvegan some distance-based methodsHeatplus heatmap plottingtopGO gene ontology (GO) analysisggplot2 routines for complex plotshexbin methods to simplify data in plotslibrary(devtools)
install_github("jtlovell/limmaDE2")library("limmaDE2")
library("ggplot2")
library("reshape2")
library("SimSeq")
library("DESeq2")counts: Raw transcript abundance countsinfo: Experimental design information These two matrices must match exactly, where each row in the info matrix corresponds to each column in counts. For best performance, the name of each gene should be stored in the rownames of the counts matrix.data(kidney) # from simseq
counts<-kidney$counts
counts<-counts[sample(1:nrow(counts),1000),]
info<-with(kidney,
data.frame(id = paste(replic, treatment, sep = "_"),
rep=replic,
Treatment=ifelse(treatment == "Tumor","tumor","cntr"),
stringsAsFactors=F))
colnames(counts)<-info$idgroup.a<-c("4619", "4712", "4863", "4865", "5452", "5453", "5454", "5455",
"5456", "5457","5458", "5461", "5462", "5463", "5465", "5467",
"5468", "5469", "5470", "5549","5552", "5580", "5641", "5672",
"5689", "5701", "5703", "5706", "5989", "6088")
info$group<-ifelse(info$rep %in% group.a, "a","b")
with(info, table(group, Treatment))## Treatment
## group cntr tumor
## a 30 30
## b 42 42
info$trt.grp<-with(info, paste(Treatment, group, sep="_"))head(info)## id rep Treatment group trt.grp
## 1 3358_Non-Tumor 3358 cntr b cntr_b
## 2 3358_Tumor 3358 tumor b tumor_b
## 3 3387_Non-Tumor 3387 cntr b cntr_b
## 4 3387_Tumor 3387 tumor b tumor_b
## 5 4619_Non-Tumor 4619 cntr a cntr_a
## 6 4619_Tumor 4619 tumor a tumor_a
counts[1:10,1:3]## 3358_Non-Tumor 3358_Tumor 3387_Non-Tumor
## GSK3B|2932 2096 3973 2310
## TEC|7006 88 246 125
## RTBDN|83546 0 0 0
## SNX13|23161 3139 4764 2798
## MAST1|22983 7 1 0
## TFAM|7019 1436 1697 1550
## UBXN10|127733 401 1581 260
## KRT14|3861 0 28 2
## CINP|51550 993 1401 1104
## TACR1|6869 80 245 49
In all statistical analyses, it is important to set the levels of the experimental factors. This is esspecially true in linear modelling, such as limma, where levels are tested against the base level. Here, we set the “Non-Tumor” treatment and “a” group as the base.
info$Treatment <- factor(info$Treatment,
levels = c("cntr", "tumor"))
info$group <- factor(info$group,
levels = c("a", "b"))stats <- pipeLIMMA(counts = counts,
info = info,
block = NULL,
formula = "~ Treatment")## calculating normalization factors ...
## running voom normalization correcting for quality weights ...
## fitting linear model ...
## processing statistics and calculating q-values ...
lmStats<-stats$stats
voom<-stats$voom$EpipeLIMMA returns three elements: stats, voom and fstats. These are the statistical output of the limma functions: eBayes, voom and topTableF. Inspect the top few rows and columns of each.
stats: eBayes linear model statistics| gene | sigma | s2.post | Amean | Fstat | Fpvalue | |
|---|---|---|---|---|---|---|
| GSK3B|2932 | GSK3B|2932 | 0.8001060 | 0.6416126 | 10.281119 | 17.4070785 | 0.0000518 |
| TEC|7006 | TEC|7006 | 0.4726687 | 0.2330547 | 5.935553 | 47.5471968 | 0.0000000 |
| RTBDN|83546 | RTBDN|83546 | 1.2190479 | 1.4708851 | -1.752828 | 0.1332056 | 0.7156636 |
| SNX13|23161 | SNX13|23161 | 0.7140688 | 0.5138993 | 10.420350 | 23.7538159 | 0.0000028 |
| MAST1|22983 | MAST1|22983 | 1.2083085 | 1.4453294 | 1.163619 | 5.4418431 | 0.0210366 |
| TFAM|7019 | TFAM|7019 | 0.6495017 | 0.4275889 | 9.127099 | 166.6738831 | 0.0000000 |
voom: normalized expression matrix| 3358_Non-Tumor | 3358_Tumor | 3387_Non-Tumor | 3387_Tumor | 4619_Non-Tumor | 4619_Tumor | |
|---|---|---|---|---|---|---|
| GSK3B|2932 | 10.135067 | 10.202899 | 10.235435 | 10.402782 | 9.9837220 | 10.808928 |
| TEC|7006 | 5.568906 | 6.192148 | 6.032989 | 5.679887 | 5.2827638 | 6.604529 |
| RTBDN|83546 | -1.898700 | -2.753296 | -1.938555 | -2.690800 | -1.4587032 | -2.126790 |
| SNX13|23161 | 10.717619 | 10.464813 | 10.511883 | 10.409041 | 10.4515648 | 12.185029 |
| MAST1|22983 | 2.008191 | -1.168334 | -1.938555 | 1.557127 | 0.1262593 | 0.680565 |
| TFAM|7019 | 9.589642 | 8.975900 | 9.659963 | 8.963388 | 9.6426160 | 9.449222 |
fstats: F-statistics for each factor in the model, in this case, just treatment| gene | Treatment_F | Treatment_P.Value | Treatment_Q.Value |
|---|---|---|---|
| AATF|26574 | 128.433398 | 0.0000000 | 0.0000000 |
| ABCB8|11194 | 69.822932 | 0.0000000 | 0.0000000 |
| ABCC6P1|653190 | 2.194034 | 0.1407181 | 0.1820415 |
| ABCC6P2|730013 | 1.154329 | 0.2844313 | 0.3502849 |
| ACAD11|84129 | 49.899739 | 0.0000000 | 0.0000000 |
| ACCN2|41 | 96.479576 | 0.0000000 | 0.0000000 |
duplicateCorrelation among blocks, then uses the blocking variable in the linear model fit.info$block <- rep(1:2,each=nrow(info)/2)
stats.block <- pipeLIMMA(counts = counts,
info = info,
block = info$block,
formula = "~ Treatment")## calculating normalization factors ...
## running voom normalization correcting for quality weights ...
## calculating duplicate correlation among replicates ...
## fitting linear model ...
## processing statistics and calculating q-values ...
stats.factorial <- pipeLIMMA(counts = counts,
info = info,
block = NULL,
formula = "~ Treatment + group + Treatment*group")## calculating normalization factors ...
## running voom normalization correcting for quality weights ...
## fitting linear model ...
## processing statistics and calculating q-values ...
Sometimes, it may make more sense to use specific contrasts to test for differential expression. For example, let’s say we are interested in how the tumors and non-tumor tissues differ for each of the two groups.
design <- with(info, model.matrix(~ 0 + trt.grp))
colnames(design)<-gsub("trt.grp","",colnames(design))
head(design)## cntr_a cntr_b tumor_a tumor_b
## 1 0 1 0 0
## 2 0 0 0 1
## 3 0 1 0 0
## 4 0 0 0 1
## 5 1 0 0 0
## 6 0 0 1 0
contrast.matrix <- makeContrasts(
tumor_a - cntr_a ,
tumor_b - cntr_b,
levels = design)
head(contrast.matrix)## Contrasts
## Levels tumor_a - cntr_a tumor_b - cntr_b
## cntr_a -1 0
## cntr_b 0 -1
## tumor_a 1 0
## tumor_b 0 1
stats <- pipeLIMMA(counts = counts,
info = info,
block = NULL,
design = design,
contrast.matrix = contrast.matrix)## calculating normalization factors ...
## running voom normalization correcting for quality weights ...
## fitting model to contrast matrix ...
## processing statistics and calculating q-values ...
stats.contrasts <- stats$statsThe function makeBinarySig looks for a provided string (e.g. “Q.Value”) and outputs a matrix with whether or not those columns have values <= the provided alpha
sigs <- makeBinarySig(stats.contrasts, what = "Qvalue", alpha = 0.05)## ebayesQvalue_tumor_a...cntr_a 627
## ebayesQvalue_tumor_b...cntr_b 665
counts2Venn(x=sigs, cols=c(1,2), names=c("in.grpA","in.grpB"),
colors=c("blue","red"),type="limma", legx=-3.3,legy=-3)counts2Venn(x=sigs, cols=c(1,2), names=c("in.grpA","in.grpB"),
colors=c("blue","red"),type="Euler", legx=-3.3,legy=-3)Volcano plots are a good way to look at the differences between two experimental levels. Here, we compare the extent of differential expression between the “high” treatment to the “low” treatment.
with(lmStats, volcanoPlot(pval = ebayesPvalue_Treatmenttumor,
lfc = Treatmenttumor_logFC,
sig = ebayesQvalue_Treatmenttumor,
main = "no tumor vs. tumor Volcano Plot",
xlab = "tumor - no tumor Log2 Fold Change",
bty = "n", legpos = "top", leginset = c(0,-.1)))## sig
## updown FALSE TRUE
## down 130 385
## up 168 317
sigs <- data.frame(makeBinarySig(stats.contrasts, what = "Qvalue", alpha = 0.05))## ebayesQvalue_tumor_a...cntr_a 627
## ebayesQvalue_tumor_b...cntr_b 665
names(sigs)<-c("sig.a","sig.b")
sigs$sign.a<-sign(stats.contrasts$tumor_a...cntr_a_logFC)
sigs$sign.b<-sign(stats.contrasts$tumor_b...cntr_b_logFC)
cols<-with(sigs, ifelse(sig.a + sig.b == 0,
"grey",
ifelse(sig.a + sig.b == 2 & sign.a*sign.b == 1,
"pink",
ifelse(sig.a + sig.b == 2 & sign.a*sign.b == -1,
"cornflowerblue",
ifelse(sig.a == 1, "darkblue", "darkred")))))with(stats.contrasts, volcanoPair(lfc1 = tumor_a...cntr_a_logFC,
lfc2 = tumor_b...cntr_b_logFC,
pt.col = cols, pt.pch = 19, pt.cex=.5,
xlab = "Tumor - control LFC (group A)",
ylab = "Tumor - control LFC (group B)"))## pt.col
## cornflowerblue darkblue darkred grey pink
## 28 123 161 212 476
It is often a good idea to get an idea of how the individuals (libraries) are structured - how similar are they to eachother. Principal component analyses allow for this kind of inference.
pca12 <- counts2PCA(counts=voom, info = info, ids = info$id, pcas2return = 6)pca12.var <- pca12[[2]]
pca12 <- pca12[[1]]
gcols <- c("darkblue", "blue", "gold", "green", "pink", "red")
ggplot(pca12, aes(x = PC1, y = PC2, col = group, shape = Treatment)) +
geom_vline(xintercept = 0, lty = 2)+
geom_hline(yintercept = 0, lty = 2)+
geom_point() +
theme_bw() +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
strip.background = element_blank()) +
labs(x = paste("PCA1 (", pca12.var[1],"%)", sep = ""),
y = paste("PCA2 (", pca12.var[2],"%)", sep = ""),
title = "PCA analysis of voom-normalized expression")For more information, visit lovelleeb.weebly.com/analytics