Last updated: 2017-12-23

Code version: 0fb561f


Quick start

We store feature-level (gene) read count and molecule count in expressionSet (data/eset) objects, which also contain sample metadata (e.g., assigned indivdual ID, cDNA concentraion) and quality filtering criteria (e.g., number of reads mapped to FUCCI transgenes, ERCC conversion rate). Data from different C1 plates are stored in separate eset objects:

\(~\)

To combine eset objects from the different C1 plates:

eset <- Reduce(combine, Map(readRDS, Sys.glob("../data/eset/*.rds")))

\(~\)

To access data stored in expressionSet:

  • exprs(eset): access count data, 20,421 features by 1,536 single cell samples.

  • pData(eset): access sample metadata. Returns data.frame of 1,536 samples by 43 labels. Use varMetadata(phenoData(eset)) to view label descriptions.

  • fData(eset): access feature metadata. Returns data.frame of 20,421 features by 6 labels. Use varMetadata(featureData(eset)) to view label descriptions.


Label description

Load related packages.

library(knitr)
library(Biobase)

To combine all expressionSet objects in the folder,

fname <- Sys.glob("../data/eset/*.rds")
eset <- Reduce(combine, Map(readRDS, fname))

View the sample metadata labels,

varMetadata(phenoData(eset))
labelDescription
experiment ID of C1 chip (i.e. processing date in YYYYMMDD)
well Well of C1 chip (96 total, rows A-H, cols 1-12)
cell_number The number of cells observed in the well via microscopy
concentration The cDNA concentration of the well prior to library prep
ERCC The dilution factor of the ERCC spike-ins
individual.1 Individual # 1 included on this C1 chip
individual.2 Individual # 2 included on this C1 chip
image_individual The chip label for the image files
image_label The well label for the image files
raw The number of raw reads
umi The number of reads with a valid UMI
mapped The number of reads with a valid UMI that mapped to a genome
unmapped The number of reads with a valid UMI that did not map to a genome
reads_ercc The number of reads that mapped to the ERCC spike-in transcripts
reads_hs The number of reads that mapped to the H. sapiens genome
reads_egfp The number of reads that mapped to the FUCCI EGFP transgene
reads_mcherry The number of reads that mapped to the FUCCI mCherry transgene
molecules The number of molecules (i.e. post UMI-deduplication)
mol_ercc The number of molecules that mapped to the ERCC spike-in transcripts
mol_hs The number of molecules that mapped to the H. sapiens genome
mol_egfp The number of molecules that mapped to the FUCCI EGFP transgene
mol_mcherry The number of molecules that mapped to the FUCCI mCherry transgene
detect_ercc The number of ERCC genes with at least one molecule
detect_hs The number of H. sapiens genes with at least one molecule
chip_id verifyBamID: The predicted individual based on the sequencing data
chipmix verifyBamID: chipmix is a metric for detecting sample swaps
freemix verifyBamID: freemix is a measure of contamination. 0 == good & 0.5 == bad
snps verifyBamID: The number of SNPs that passed thresholds for AF and missingness
reads verifyBamID: The number of sequences that overlapped SNPs
avg_dp verifyBamID: The average sequencing depth that covered a SNP
min_dp verifyBamID: A minimun depth threshold for QC only (affects snps_w_min)
snps_w_min verifyBamID: The number of SNPs that had the minimum depth (min_dp); QC only
valid_id verifyBamID: Is the predicted individual 1 of the 2 added to the C1 chip?
cut_off_reads QC filter: number of mapped reads > 85th percentile among zero-cell samples
unmapped_ratios QC filter: among reads with a valid UMI, number of unmapped/number of mapped (unmapped/umi)
cut_off_unmapped QC filter: unmapped ratio < 30th percentile among zero-cell samples
ercc_percentage QC filter: number of reads mapped to ERCC/total sample mapped reads (reads_ercc/mapped)
cut_off_ercc QC filter: ercc percentage < 15th percentile among zero-cell samples
cut_off_genes QC filter: number of endogenous genes with at least one molecule (detect_hs) > 85th percentile among zero-cell samples
ercc_conversion QC filter: among ERCC, number of molecules/number of mapped reads (mol_ercc/reads_ercc)
conversion QC filter: among endogenous genes, number of molecules/number of mapped reads (mol_hs/reads_hs)
conversion_outlier QC filter: microscopy detects 1 cell AND ERCC conversion rate > .094
filter_all QC filter: Does the sample pass all the QC filters? cell_number==1, mol_egfp >0, valid_id==1, cut_off_reads==TRUE, cut_off_ercc==TRUE, cut_off_genes=TRUE

View the feature (gene) metadata labels,

varMetadata(featureData(eset))
labelDescription
chr Chromosome
start Most 5’ start position (GRCh37/hg19; 1-based; inclusive)
end Most 3’ end position (GRCh37/hg19; 1-based; inclusive)
name Gene name
strand Strand (+ = positive/forward; - = negative/reverse)
source Source of RNA

Session information

R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.2 (Nitrogen)

Matrix products: default
BLAS: /home/joycehsiao/miniconda3/envs/fucci-seq/lib/R/lib/libRblas.so
LAPACK: /home/joycehsiao/miniconda3/envs/fucci-seq/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] Biobase_2.38.0      BiocGenerics_0.24.0 knitr_1.17         

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.14    digest_0.6.12   rprojroot_1.2   backports_1.0.5
 [5] git2r_0.19.0    magrittr_1.5    evaluate_0.10.1 highr_0.6      
 [9] stringi_1.1.2   rmarkdown_1.8   tools_3.4.1     stringr_1.2.0  
[13] yaml_2.1.16     compiler_3.4.1  htmltools_0.3.6

This R Markdown site was created with workflowr