Data overview

Last updated: 2017-12-23

Code version: 0fb561f

Quick start

We store feature-level (gene) read count and molecule count in expressionSet (data/eset) objects, which also contain sample metadata (e.g., assigned indivdual ID, cDNA concentraion) and quality filtering criteria (e.g., number of reads mapped to FUCCI transgenes, ERCC conversion rate). Data from different C1 plates are stored in separate eset objects:

\(~\)

To combine eset objects from the different C1 plates:

eset <- Reduce(combine, Map(readRDS, Sys.glob("../data/eset/*.rds")))

\(~\)

To access data stored in expressionSet:

exprs(eset): access count data, 20,421 features by 1,536 single cell samples.
pData(eset): access sample metadata. Returns data.frame of 1,536 samples by 43 labels. Use varMetadata(phenoData(eset)) to view label descriptions.
fData(eset): access feature metadata. Returns data.frame of 20,421 features by 6 labels. Use varMetadata(featureData(eset)) to view label descriptions.

Label description

Load related packages.

library(knitr)
library(Biobase)

To combine all expressionSet objects in the folder,

fname <- Sys.glob("../data/eset/*.rds")
eset <- Reduce(combine, Map(readRDS, fname))

View the sample metadata labels,

varMetadata(phenoData(eset))

	labelDescription
experiment	ID of C1 chip (i.e. processing date in YYYYMMDD)
well	Well of C1 chip (96 total, rows A-H, cols 1-12)
cell_number	The number of cells observed in the well via microscopy
concentration	The cDNA concentration of the well prior to library prep
ERCC	The dilution factor of the ERCC spike-ins
individual.1	Individual # 1 included on this C1 chip
individual.2	Individual # 2 included on this C1 chip
image_individual	The chip label for the image files
image_label	The well label for the image files
raw	The number of raw reads
umi	The number of reads with a valid UMI
mapped	The number of reads with a valid UMI that mapped to a genome
unmapped	The number of reads with a valid UMI that did not map to a genome
reads_ercc	The number of reads that mapped to the ERCC spike-in transcripts
reads_hs	The number of reads that mapped to the H. sapiens genome
reads_egfp	The number of reads that mapped to the FUCCI EGFP transgene
reads_mcherry	The number of reads that mapped to the FUCCI mCherry transgene
molecules	The number of molecules (i.e. post UMI-deduplication)
mol_ercc	The number of molecules that mapped to the ERCC spike-in transcripts
mol_hs	The number of molecules that mapped to the H. sapiens genome
mol_egfp	The number of molecules that mapped to the FUCCI EGFP transgene
mol_mcherry	The number of molecules that mapped to the FUCCI mCherry transgene
detect_ercc	The number of ERCC genes with at least one molecule
detect_hs	The number of H. sapiens genes with at least one molecule
chip_id	verifyBamID: The predicted individual based on the sequencing data
chipmix	verifyBamID: chipmix is a metric for detecting sample swaps
freemix	verifyBamID: freemix is a measure of contamination. 0 == good & 0.5 == bad
snps	verifyBamID: The number of SNPs that passed thresholds for AF and missingness
reads	verifyBamID: The number of sequences that overlapped SNPs
avg_dp	verifyBamID: The average sequencing depth that covered a SNP
min_dp	verifyBamID: A minimun depth threshold for QC only (affects snps_w_min)
snps_w_min	verifyBamID: The number of SNPs that had the minimum depth (min_dp); QC only
valid_id	verifyBamID: Is the predicted individual 1 of the 2 added to the C1 chip?
cut_off_reads	QC filter: number of mapped reads > 85th percentile among zero-cell samples
unmapped_ratios	QC filter: among reads with a valid UMI, number of unmapped/number of mapped (unmapped/umi)
cut_off_unmapped	QC filter: unmapped ratio < 30th percentile among zero-cell samples
ercc_percentage	QC filter: number of reads mapped to ERCC/total sample mapped reads (reads_ercc/mapped)
cut_off_ercc	QC filter: ercc percentage < 15th percentile among zero-cell samples
cut_off_genes	QC filter: number of endogenous genes with at least one molecule (detect_hs) > 85th percentile among zero-cell samples
ercc_conversion	QC filter: among ERCC, number of molecules/number of mapped reads (mol_ercc/reads_ercc)
conversion	QC filter: among endogenous genes, number of molecules/number of mapped reads (mol_hs/reads_hs)
conversion_outlier	QC filter: microscopy detects 1 cell AND ERCC conversion rate > .094
filter_all	QC filter: Does the sample pass all the QC filters? cell_number==1, mol_egfp >0, valid_id==1, cut_off_reads==TRUE, cut_off_ercc==TRUE, cut_off_genes=TRUE

View the feature (gene) metadata labels,

varMetadata(featureData(eset))

	labelDescription
chr	Chromosome
start	Most 5’ start position (GRCh37/hg19; 1-based; inclusive)
end	Most 3’ end position (GRCh37/hg19; 1-based; inclusive)
name	Gene name
strand	Strand (+ = positive/forward; - = negative/reverse)
source	Source of RNA

Session information

R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.2 (Nitrogen)

Matrix products: default
BLAS: /home/joycehsiao/miniconda3/envs/fucci-seq/lib/R/lib/libRblas.so
LAPACK: /home/joycehsiao/miniconda3/envs/fucci-seq/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] Biobase_2.38.0      BiocGenerics_0.24.0 knitr_1.17         

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.14    digest_0.6.12   rprojroot_1.2   backports_1.0.5
 [5] git2r_0.19.0    magrittr_1.5    evaluate_0.10.1 highr_0.6      
 [9] stringi_1.1.2   rmarkdown_1.8   tools_3.4.1     stringr_1.2.0  
[13] yaml_2.1.16     compiler_3.4.1  htmltools_0.3.6

This R Markdown site was created with workflowr

Data overview

Joyce Hsiao

Quick start

Label description

Session information