pipeLIMMA
Run analyses of differnetial expression in LIMMA.makeBinarySig
Find and count the number of significantly differentially expressed genespqHists
Plot the distribution of P and FDR-corrected P (Q) valuesvoom2PCA
Conduct principal component analyses on voom (or otherwise) normalized expression matricesvolcanoPlot
Generate log2 foldchange vs. P-value volcano plotsvolcanoPair
Compare two sets of log2 fold changes, colored by p-valuespipeDESeq2
Runs an analysis of differential expression similar to that of pipeLIMMA, except through the DESeq2 package.The limmaDE2 packages is not on CRAN and can only be installed from github. Make sure the devtools
package is installed on your system. The limmaDE2 package also requires several other R packages to be installed. These packages have functions upon which limmaDE2 depends:
limma
contains all of the basic statistical elements that are called here.qvalue
fdr-corrected p-valuesedgeR
Differential expression package used for some transformationsSimSeq
permits simulation of RNA-seq dataplyr
summarize and manipulate lists and dataframesvegan
some distance-based methodsHeatplus
heatmap plottingtopGO
gene ontology (GO) analysisggplot2
routines for complex plotshexbin
methods to simplify data in plotslibrary(devtools)
install_github("jtlovell/limmaDE2")
library("limmaDE2")
library("ggplot2")
library("reshape2")
library("SimSeq")
library("DESeq2")
counts
: Raw transcript abundance countsinfo
: Experimental design information These two matrices must match exactly, where each row in the info
matrix corresponds to each column in counts
. For best performance, the name of each gene should be stored in the rownames of the counts matrix.data(kidney) # from simseq
counts<-kidney$counts
counts<-counts[sample(1:nrow(counts),1000),]
info<-with(kidney,
data.frame(id = paste(replic, treatment, sep = "_"),
rep=replic,
Treatment=ifelse(treatment == "Tumor","tumor","cntr"),
stringsAsFactors=F))
colnames(counts)<-info$id
group.a<-c("4619", "4712", "4863", "4865", "5452", "5453", "5454", "5455",
"5456", "5457","5458", "5461", "5462", "5463", "5465", "5467",
"5468", "5469", "5470", "5549","5552", "5580", "5641", "5672",
"5689", "5701", "5703", "5706", "5989", "6088")
info$group<-ifelse(info$rep %in% group.a, "a","b")
with(info, table(group, Treatment))
## Treatment
## group cntr tumor
## a 30 30
## b 42 42
info$trt.grp<-with(info, paste(Treatment, group, sep="_"))
head(info)
## id rep Treatment group trt.grp
## 1 3358_Non-Tumor 3358 cntr b cntr_b
## 2 3358_Tumor 3358 tumor b tumor_b
## 3 3387_Non-Tumor 3387 cntr b cntr_b
## 4 3387_Tumor 3387 tumor b tumor_b
## 5 4619_Non-Tumor 4619 cntr a cntr_a
## 6 4619_Tumor 4619 tumor a tumor_a
counts[1:10,1:3]
## 3358_Non-Tumor 3358_Tumor 3387_Non-Tumor
## GSK3B|2932 2096 3973 2310
## TEC|7006 88 246 125
## RTBDN|83546 0 0 0
## SNX13|23161 3139 4764 2798
## MAST1|22983 7 1 0
## TFAM|7019 1436 1697 1550
## UBXN10|127733 401 1581 260
## KRT14|3861 0 28 2
## CINP|51550 993 1401 1104
## TACR1|6869 80 245 49
In all statistical analyses, it is important to set the levels of the experimental factors. This is esspecially true in linear modelling, such as limma, where levels are tested against the base level. Here, we set the “Non-Tumor” treatment and “a” group as the base.
info$Treatment <- factor(info$Treatment,
levels = c("cntr", "tumor"))
info$group <- factor(info$group,
levels = c("a", "b"))
stats <- pipeLIMMA(counts = counts,
info = info,
block = NULL,
formula = "~ Treatment")
## calculating normalization factors ...
## running voom normalization correcting for quality weights ...
## fitting linear model ...
## processing statistics and calculating q-values ...
lmStats<-stats$stats
voom<-stats$voom$E
pipeLIMMA
returns three elements: stats
, voom
and fstats
. These are the statistical output of the limma functions: eBayes
, voom
and topTableF
. Inspect the top few rows and columns of each.
stats
: eBayes linear model statisticsgene | sigma | s2.post | Amean | Fstat | Fpvalue | |
---|---|---|---|---|---|---|
GSK3B|2932 | GSK3B|2932 | 0.8001060 | 0.6416126 | 10.281119 | 17.4070785 | 0.0000518 |
TEC|7006 | TEC|7006 | 0.4726687 | 0.2330547 | 5.935553 | 47.5471968 | 0.0000000 |
RTBDN|83546 | RTBDN|83546 | 1.2190479 | 1.4708851 | -1.752828 | 0.1332056 | 0.7156636 |
SNX13|23161 | SNX13|23161 | 0.7140688 | 0.5138993 | 10.420350 | 23.7538159 | 0.0000028 |
MAST1|22983 | MAST1|22983 | 1.2083085 | 1.4453294 | 1.163619 | 5.4418431 | 0.0210366 |
TFAM|7019 | TFAM|7019 | 0.6495017 | 0.4275889 | 9.127099 | 166.6738831 | 0.0000000 |
voom
: normalized expression matrix3358_Non-Tumor | 3358_Tumor | 3387_Non-Tumor | 3387_Tumor | 4619_Non-Tumor | 4619_Tumor | |
---|---|---|---|---|---|---|
GSK3B|2932 | 10.135067 | 10.202899 | 10.235435 | 10.402782 | 9.9837220 | 10.808928 |
TEC|7006 | 5.568906 | 6.192148 | 6.032989 | 5.679887 | 5.2827638 | 6.604529 |
RTBDN|83546 | -1.898700 | -2.753296 | -1.938555 | -2.690800 | -1.4587032 | -2.126790 |
SNX13|23161 | 10.717619 | 10.464813 | 10.511883 | 10.409041 | 10.4515648 | 12.185029 |
MAST1|22983 | 2.008191 | -1.168334 | -1.938555 | 1.557127 | 0.1262593 | 0.680565 |
TFAM|7019 | 9.589642 | 8.975900 | 9.659963 | 8.963388 | 9.6426160 | 9.449222 |
fstats
: F-statistics for each factor in the model, in this case, just treatmentgene | Treatment_F | Treatment_P.Value | Treatment_Q.Value |
---|---|---|---|
AATF|26574 | 128.433398 | 0.0000000 | 0.0000000 |
ABCB8|11194 | 69.822932 | 0.0000000 | 0.0000000 |
ABCC6P1|653190 | 2.194034 | 0.1407181 | 0.1820415 |
ABCC6P2|730013 | 1.154329 | 0.2844313 | 0.3502849 |
ACAD11|84129 | 49.899739 | 0.0000000 | 0.0000000 |
ACCN2|41 | 96.479576 | 0.0000000 | 0.0000000 |
duplicateCorrelation
among blocks, then uses the blocking variable in the linear model fit.info$block <- rep(1:2,each=nrow(info)/2)
stats.block <- pipeLIMMA(counts = counts,
info = info,
block = info$block,
formula = "~ Treatment")
## calculating normalization factors ...
## running voom normalization correcting for quality weights ...
## calculating duplicate correlation among replicates ...
## fitting linear model ...
## processing statistics and calculating q-values ...
stats.factorial <- pipeLIMMA(counts = counts,
info = info,
block = NULL,
formula = "~ Treatment + group + Treatment*group")
## calculating normalization factors ...
## running voom normalization correcting for quality weights ...
## fitting linear model ...
## processing statistics and calculating q-values ...
Sometimes, it may make more sense to use specific contrasts to test for differential expression. For example, let’s say we are interested in how the tumors and non-tumor tissues differ for each of the two groups.
design <- with(info, model.matrix(~ 0 + trt.grp))
colnames(design)<-gsub("trt.grp","",colnames(design))
head(design)
## cntr_a cntr_b tumor_a tumor_b
## 1 0 1 0 0
## 2 0 0 0 1
## 3 0 1 0 0
## 4 0 0 0 1
## 5 1 0 0 0
## 6 0 0 1 0
contrast.matrix <- makeContrasts(
tumor_a - cntr_a ,
tumor_b - cntr_b,
levels = design)
head(contrast.matrix)
## Contrasts
## Levels tumor_a - cntr_a tumor_b - cntr_b
## cntr_a -1 0
## cntr_b 0 -1
## tumor_a 1 0
## tumor_b 0 1
stats <- pipeLIMMA(counts = counts,
info = info,
block = NULL,
design = design,
contrast.matrix = contrast.matrix)
## calculating normalization factors ...
## running voom normalization correcting for quality weights ...
## fitting model to contrast matrix ...
## processing statistics and calculating q-values ...
stats.contrasts <- stats$stats
The function makeBinarySig
looks for a provided string (e.g. “Q.Value”) and outputs a matrix with whether or not those columns have values <= the provided alpha
sigs <- makeBinarySig(stats.contrasts, what = "Qvalue", alpha = 0.05)
## ebayesQvalue_tumor_a...cntr_a 627
## ebayesQvalue_tumor_b...cntr_b 665
counts2Venn(x=sigs, cols=c(1,2), names=c("in.grpA","in.grpB"),
colors=c("blue","red"),type="limma", legx=-3.3,legy=-3)
counts2Venn(x=sigs, cols=c(1,2), names=c("in.grpA","in.grpB"),
colors=c("blue","red"),type="Euler", legx=-3.3,legy=-3)
Volcano plots are a good way to look at the differences between two experimental levels. Here, we compare the extent of differential expression between the “high” treatment to the “low” treatment.
with(lmStats, volcanoPlot(pval = ebayesPvalue_Treatmenttumor,
lfc = Treatmenttumor_logFC,
sig = ebayesQvalue_Treatmenttumor,
main = "no tumor vs. tumor Volcano Plot",
xlab = "tumor - no tumor Log2 Fold Change",
bty = "n", legpos = "top", leginset = c(0,-.1)))
## sig
## updown FALSE TRUE
## down 130 385
## up 168 317
sigs <- data.frame(makeBinarySig(stats.contrasts, what = "Qvalue", alpha = 0.05))
## ebayesQvalue_tumor_a...cntr_a 627
## ebayesQvalue_tumor_b...cntr_b 665
names(sigs)<-c("sig.a","sig.b")
sigs$sign.a<-sign(stats.contrasts$tumor_a...cntr_a_logFC)
sigs$sign.b<-sign(stats.contrasts$tumor_b...cntr_b_logFC)
cols<-with(sigs, ifelse(sig.a + sig.b == 0,
"grey",
ifelse(sig.a + sig.b == 2 & sign.a*sign.b == 1,
"pink",
ifelse(sig.a + sig.b == 2 & sign.a*sign.b == -1,
"cornflowerblue",
ifelse(sig.a == 1, "darkblue", "darkred")))))
with(stats.contrasts, volcanoPair(lfc1 = tumor_a...cntr_a_logFC,
lfc2 = tumor_b...cntr_b_logFC,
pt.col = cols, pt.pch = 19, pt.cex=.5,
xlab = "Tumor - control LFC (group A)",
ylab = "Tumor - control LFC (group B)"))
## pt.col
## cornflowerblue darkblue darkred grey pink
## 28 123 161 212 476
It is often a good idea to get an idea of how the individuals (libraries) are structured - how similar are they to eachother. Principal component analyses allow for this kind of inference.
pca12 <- counts2PCA(counts=voom, info = info, ids = info$id, pcas2return = 6)
pca12.var <- pca12[[2]]
pca12 <- pca12[[1]]
gcols <- c("darkblue", "blue", "gold", "green", "pink", "red")
ggplot(pca12, aes(x = PC1, y = PC2, col = group, shape = Treatment)) +
geom_vline(xintercept = 0, lty = 2)+
geom_hline(yintercept = 0, lty = 2)+
geom_point() +
theme_bw() +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
strip.background = element_blank()) +
labs(x = paste("PCA1 (", pca12.var[1],"%)", sep = ""),
y = paste("PCA2 (", pca12.var[2],"%)", sep = ""),
title = "PCA analysis of voom-normalized expression")
For more information, visit lovelleeb.weebly.com/analytics