Last updated: 2017-12-11
Code version: a5ec074
Previously when investigating measurement variation (GFP/RFP/FAPI), we learned that there’s significant variation between batches in the distributions of background-corrected pixel intensites. See here.
\(~\)
Approach:
In this document, I apply quantile normalization to intensity measurements on a per-channel basis. The approach is as follows
Construct a reference intensity. Estimate k-quantiles of reference intensities \(Q^{R,k}= \big( {q^{R,k}_{[1]}, \dots, q^{R,k}_{[n_k]}} \big)\).
For each plate \(i\), estimate \(l\)-quantiles of intensities on a per-plate basis \(Q^{i,l}= \big( {q^{i,l}_{[1]}, \dots, q^{i,l}_{[n_l]}} \big)\).
For each plate \(i\), compare the intensity value \(F_{ij}\) with the quantile values \(\big( {q^{i,l}_{[1]}, \dots, q^{i,l}_{[n_l]}} \big)\) and assign image/well \(j\) with the quantile that has the closest intensity value, say \(q^{i,l}_{[m]}\) if \(m= argmin_{(1,\dots,n_l)} |F_{ij}- q^{i,l}_{[n_l]}|\). Then subsitute the intensity value with the \(m\)-th quantile value in the reference intensity \(q^{R,k}_{[m]}\).
\(~\)
I tried two methods for constructing reference intensity vector, and the results are vastly different depends on the method that we choose.
Method 1: Aggregate intensity values aross plates.
Method 2: Take the average of \(n\)-quantiles across plates.
\(~\)
Results:
We chose 1/.005 quantiles for all three channels. See the document for our exploratory analysis of intensities from all three channels.
Method 1 versus Method 2: Because in Method 1, the distribution of Green/Red is more dense toward low and high-valued intensities, we see that the normalized values are closer toward the boundaries.
Method 2 of constructing the reference produces better results. We see that the relationship between Green/Red is preserved before versus after normalization.
In Method 2, after normalization, the range of intensities is the same between plates for each of the three channels (Green, Red, DAPI). As a result, many of the images/wells with low intensties decreased in intensity values.
Because of 4, the distances between samples in many plates increase rather than decrease. We were looking for decrease in the distances between samples, i.e., tighter clusters or smaller within-cluster distance…
ints <- readRDS(file="/project2/gilad/joycehsiao/fucci-seq/data/intensity.rds")
\(~\)
\(~\)
First, look at the distribution of all batches combined versus each batch.
\(~\)
\(~\)
\(~\)
Code for one single sample
\(~\)
my_quantnorm <- function(reference, sample, span=.01) {
# quantiles for intensities all samples across plates
quants_reference <- quantile(reference, probs=seq(0,1,span))
# intensities for a given plate
# quantiles for intensities at each plate
quants_sample <- quantile(sample, probs=seq(0,1,span))
# empty vector for normalized values
sample_normed <- vector("numeric", length=length(sample))
for (index in 1:length(sample)) {
# for each sample, find the closest sample quantile
sample_order <- names(which.min(abs(sample[index]-quants_sample)))
# # get the reference intensity value of the closet quantile
ref_order_value <- reference[which(names(quants_reference)==sample_order)]
# assign the reference intensity value to the sample
sample_normed[index] <- ref_order_value
}
return(sample_normed)
}
\(~\)
\(~\)
RFP
\(~\)
\(~\)
GFP
\(~\)
\(~\)
DAPI
\(~\)
\(~\)
Method 1 constructs a vector of refernece intensity by aggregating all image intensity values across plates.
\(~\)
\(~\)
Distribution of the reference
\(~\)
\(~\)
After normalization
\(~\)
\(~\)
Green versus Red intensties by plate, labeled by DAPI
\(~\)
\(~\)
Green versus Red intensties by individual, labeled by DAPI
\(~\)
\(~\)
Green
\(~\)
\(~\)
Red
\(~\)
\(~\)
DAPI
\(~\)
\(~\)
\(~\)
Reference intensity vector: average of quantile values across plates.
\(~\)
\(~\)
After normalization
\(~\)
\(~\)
Green versus Red intensties by plate, labeled by DAPI
\(~\)
\(~\)
Green versus Red intensties by individual, labeled by DAPI
\(~\)
\(~\)
Green
\(~\)
\(~\)
Red
\(~\)
\(~\)
DAPI
\(~\)
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.2 (Nitrogen)
Matrix products: default
BLAS: /home/joycehsiao/miniconda3/envs/fucci-seq/lib/R/lib/libRblas.so
LAPACK: /home/joycehsiao/miniconda3/envs/fucci-seq/lib/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] Biobase_2.38.0 BiocGenerics_0.24.0 RColorBrewer_1.1-2
[4] wesanderson_0.3.2 cowplot_0.8.0 ggplot2_2.2.1
[7] dplyr_0.7.0 data.table_1.10.4
loaded via a namespace (and not attached):
[1] Rcpp_0.12.14 knitr_1.16 magrittr_1.5 munsell_0.4.3
[5] colorspace_1.3-2 R6_2.2.0 rlang_0.1.2 stringr_1.2.0
[9] plyr_1.8.4 tools_3.4.1 grid_3.4.1 gtable_0.2.0
[13] git2r_0.19.0 htmltools_0.3.6 lazyeval_0.2.0 yaml_2.1.14
[17] rprojroot_1.2 digest_0.6.12 assertthat_0.1 tibble_1.3.3
[21] glue_1.1.1 evaluate_0.10.1 rmarkdown_1.6 labeling_0.3
[25] stringi_1.1.2 compiler_3.4.1 scales_0.4.1 backports_1.0.5
This R Markdown site was created with workflowr