Data overview

Last updated: 2018-04-09

Code version: a211c93

Overview

We collected two types of data for each single cell sample: single-cell RNA-seq using C1 plates and FUCCI image intensity data.

Raw RNA-seq data: data/eset-raw.rds

The filtered data are stored as follows:

combine intensity and RNA-seq: data/eset-filtered.rds
FUCCI intensity data: data/intensity.rds
RNA-seq data: output/gene-filtering.Rmd/eset-filtered.rdata

FUCCI intensity data

Combined intensity data are stored in data/intensity.rds. These include samples that were identified to have a single nuclei .
Data generated by [combine-intensity-data.R][data/combine-intensity-data.R]. Combining image analysis output stored in /project2/gilad/fucci-seq/intensities_stats/ into one data.frame and computes summary statistics, including background-corrected RFP and GFP intensity measures.

ints <- readRDS(file="../data/intensity.rds")
colnames(ints)

 [1] "plate"                 "image"                
 [3] "size"                  "perimeter"            
 [5] "eccentricity"          "rfp.fore.zoom.mean"   
 [7] "rfp.fore.zoom.median"  "gfp.fore.zoom.mean"   
 [9] "gfp.fore.zoom.median"  "dapi.fore.zoom.mean"  
[11] "dapi.fore.zoom.median" "rfp.back.zoom.mean"   
[13] "rfp.back.zoom.median"  "gfp.back.zoom.mean"   
[15] "gfp.back.zoom.median"  "dapi.back.zoom.mean"  
[17] "dapi.back.zoom.median" "rfp.mean.log10sum"    
[19] "gfp.mean.log10sum"     "dapi.mean.log10sum"   
[21] "rfp.median.log10sum"   "gfp.median.log10sum"  
[23] "dapi.median.log10sum"  "unique"               
[25] "chip_id"

Sequencing data

Raw data from each C1 plate are stored separatley in data/eset/ by experiment (batch) ID.
Raw data combining C1 plate are stored in data/eset-raw.rds.
Filtered raw data are stored in output/gene-filtering.Rmd/eset-filtering.rdata (both eSet and CPM data.table). These are filtered for genes and high quality samples using sequencing data results.

load(file="../output/gene-filtering.Rmd/eset-filtered.rdata")
eset_filtered

ExpressionSet (storageMode: lockedEnvironment)
assayData: 11721 features, 1025 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: 20170905-A01 20170905-A02 ... 20170924-H12 (1025
    total)
  varLabels: experiment well ... filter_all (43 total)
  varMetadata: labelDescription
featureData
  featureNames: EGFP ENSG00000000003 ... mCherry (11721 total)
  fvarLabels: chr start ... source (6 total)
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation:

str(cpm_filtered)

 num [1:11721, 1:1025] 294 187 0 147 0 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:11721] "EGFP" "ENSG00000000003" "ENSG00000000005" "ENSG00000000419" ...
  ..$ : chr [1:1025] "20170905-A01" "20170905-A02" "20170905-A03" "20170905-A06" ...

Match intensity with sequencing data

We match labels in the intensity data and in the sequencing data. 990 samples are quantified in both datasets.

pdata <- pData(eset_filtered)
pdata$unique <- paste(pdata$image_individual, sprintf("%05d", pdata$image_label), sep="_")

sample_include_bothdata <- intersect(ints$unique, pdata$unique)
length(sample_include_bothdata)

[1] 990

Make a combined eSet object include both FUCCI intensity data and RNA-seq data.

ints_combo <- ints[which(ints$unique %in% sample_include_bothdata), ]

eset_combo <- new("ExpressionSet", 
              exprs = exprs(eset_filtered)[,which(pdata$unique %in% sample_include_bothdata)], 
              phenoData = phenoData(eset_filtered)[which(pdata$unique %in% sample_include_bothdata), ], 
              featureData = featureData(eset_filtered))

pdata_combo <- pData(eset_combo)
pdata_combo$unique <- paste(pdata_combo$image_individual, 
                            sprintf("%05d", pdata_combo$image_label), sep="_")

all.equal(ints_combo$unique, pdata_combo$unique)

pdata_table <- rbind(varMetadata(phenoData(eset_combo)),
                    "mCherry background-corrected intensity (log10sum)",
                    "EGFP background-corrected intensity (log10sum)",
                    "DAPI background-corrected intensity (log10sum)",
                    "nucleus size",
                    "nucleus perimeter",
                    "nucleus eccentricity")
rownames(pdata_table) <- c(rownames(varMetadata(phenoData(eset_combo))),
                           "rfp.median.log10sum",
                           "gfp.median.log10sum",
                           "dapi.median.log10sum",
                           "size", "perimeter", "eccentricity")


phenoData(eset_combo) <- new("AnnotatedDataFrame", 
                           data = data.frame(pData(eset_combo),
                                             rfp.median.log10sum=ints_combo$rfp.median.log10sum,
                                             gfp.median.log10sum=ints_combo$gfp.median.log10sum,
                                             dapi.median.log10sum=ints_combo$dapi.median.log10sum,
                                             size=ints_combo$size,
                                             perimeter=ints_combo$perimeter,
                                             eccentricity=ints_combo$eccentricity),
                          varMetadata = pdata_table)

saveRDS(eset_combo, file = "../data/eset-filtered.rds")

Access expressionSets

We store feature-level (gene) read count and molecule count in expressionSet (data/eset) objects, which also contain sample metadata (e.g., assigned indivdual ID, cDNA concentraion) and quality filtering criteria (e.g., number of reads mapped to FUCCI transgenes, ERCC conversion rate). Data from different C1 plates are stored in separate eset objects:

To combine eset objects from the different C1 plates:

eset <- Reduce(combine, Map(readRDS, Sys.glob("data/eset/*.rds")))

To access data stored in expressionSet:

exprs(eset): access count data, 20,421 features by 1,536 single cell samples.
pData(eset): access sample metadata. Returns data.frame of 1,536 samples by 43 labels. Use varMetadata(phenoData(eset)) to view label descriptions.
fData(eset): access feature metadata. Returns data.frame of 20,421 features by 6 labels. Use varMetadata(featureData(eset)) to view label descriptions.
varMetadata(phenoData(eset)): view the sample metadata labels.
varMetadata(featureData(eset)): view the feature (gene) metadata labels.

Session information

R version 3.4.1 (2017-06-30)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Scientific Linux 7.2 (Nitrogen)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] Biobase_2.38.0      BiocGenerics_0.24.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.16    digest_0.6.15   rprojroot_1.3-2 backports_1.1.2
 [5] git2r_0.21.0    magrittr_1.5    evaluate_0.10.1 stringi_1.1.7  
 [9] rmarkdown_1.9   tools_3.4.1     stringr_1.3.0   yaml_2.1.18    
[13] compiler_3.4.1  htmltools_0.3.6 knitr_1.20

This R Markdown site was created with workflowr