Last updated: 2017-12-23
Code version: 0fb561f
We store feature-level (gene) read count and molecule count in expressionSet
(data/eset
) objects, which also contain sample metadata (e.g., assigned indivdual ID, cDNA concentraion) and quality filtering criteria (e.g., number of reads mapped to FUCCI transgenes, ERCC conversion rate). Data from different C1 plates are stored in separate eset
objects:
\(~\)
To combine eset
objects from the different C1 plates:
eset <- Reduce(combine, Map(readRDS, Sys.glob("../data/eset/*.rds")))
\(~\)
To access data stored in expressionSet
:
exprs(eset)
: access count data, 20,421 features by 1,536 single cell samples.
pData(eset)
: access sample metadata. Returns data.frame of 1,536 samples by 43 labels. Use varMetadata(phenoData(eset))
to view label descriptions.
fData(eset)
: access feature metadata. Returns data.frame of 20,421 features by 6 labels. Use varMetadata(featureData(eset))
to view label descriptions.
Load related packages.
library(knitr)
library(Biobase)
To combine all expressionSet
objects in the folder,
fname <- Sys.glob("../data/eset/*.rds")
eset <- Reduce(combine, Map(readRDS, fname))
View the sample metadata labels,
varMetadata(phenoData(eset))
labelDescription | |
---|---|
experiment | ID of C1 chip (i.e. processing date in YYYYMMDD) |
well | Well of C1 chip (96 total, rows A-H, cols 1-12) |
cell_number | The number of cells observed in the well via microscopy |
concentration | The cDNA concentration of the well prior to library prep |
ERCC | The dilution factor of the ERCC spike-ins |
individual.1 | Individual # 1 included on this C1 chip |
individual.2 | Individual # 2 included on this C1 chip |
image_individual | The chip label for the image files |
image_label | The well label for the image files |
raw | The number of raw reads |
umi | The number of reads with a valid UMI |
mapped | The number of reads with a valid UMI that mapped to a genome |
unmapped | The number of reads with a valid UMI that did not map to a genome |
reads_ercc | The number of reads that mapped to the ERCC spike-in transcripts |
reads_hs | The number of reads that mapped to the H. sapiens genome |
reads_egfp | The number of reads that mapped to the FUCCI EGFP transgene |
reads_mcherry | The number of reads that mapped to the FUCCI mCherry transgene |
molecules | The number of molecules (i.e. post UMI-deduplication) |
mol_ercc | The number of molecules that mapped to the ERCC spike-in transcripts |
mol_hs | The number of molecules that mapped to the H. sapiens genome |
mol_egfp | The number of molecules that mapped to the FUCCI EGFP transgene |
mol_mcherry | The number of molecules that mapped to the FUCCI mCherry transgene |
detect_ercc | The number of ERCC genes with at least one molecule |
detect_hs | The number of H. sapiens genes with at least one molecule |
chip_id | verifyBamID: The predicted individual based on the sequencing data |
chipmix | verifyBamID: chipmix is a metric for detecting sample swaps |
freemix | verifyBamID: freemix is a measure of contamination. 0 == good & 0.5 == bad |
snps | verifyBamID: The number of SNPs that passed thresholds for AF and missingness |
reads | verifyBamID: The number of sequences that overlapped SNPs |
avg_dp | verifyBamID: The average sequencing depth that covered a SNP |
min_dp | verifyBamID: A minimun depth threshold for QC only (affects snps_w_min) |
snps_w_min | verifyBamID: The number of SNPs that had the minimum depth (min_dp); QC only |
valid_id | verifyBamID: Is the predicted individual 1 of the 2 added to the C1 chip? |
cut_off_reads | QC filter: number of mapped reads > 85th percentile among zero-cell samples |
unmapped_ratios | QC filter: among reads with a valid UMI, number of unmapped/number of mapped (unmapped/umi) |
cut_off_unmapped | QC filter: unmapped ratio < 30th percentile among zero-cell samples |
ercc_percentage | QC filter: number of reads mapped to ERCC/total sample mapped reads (reads_ercc/mapped) |
cut_off_ercc | QC filter: ercc percentage < 15th percentile among zero-cell samples |
cut_off_genes | QC filter: number of endogenous genes with at least one molecule (detect_hs) > 85th percentile among zero-cell samples |
ercc_conversion | QC filter: among ERCC, number of molecules/number of mapped reads (mol_ercc/reads_ercc) |
conversion | QC filter: among endogenous genes, number of molecules/number of mapped reads (mol_hs/reads_hs) |
conversion_outlier | QC filter: microscopy detects 1 cell AND ERCC conversion rate > .094 |
filter_all | QC filter: Does the sample pass all the QC filters? cell_number==1, mol_egfp >0, valid_id==1, cut_off_reads==TRUE, cut_off_ercc==TRUE, cut_off_genes=TRUE |
View the feature (gene) metadata labels,
varMetadata(featureData(eset))
labelDescription | |
---|---|
chr | Chromosome |
start | Most 5’ start position (GRCh37/hg19; 1-based; inclusive) |
end | Most 3’ end position (GRCh37/hg19; 1-based; inclusive) |
name | Gene name |
strand | Strand (+ = positive/forward; - = negative/reverse) |
source | Source of RNA |
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.2 (Nitrogen)
Matrix products: default
BLAS: /home/joycehsiao/miniconda3/envs/fucci-seq/lib/R/lib/libRblas.so
LAPACK: /home/joycehsiao/miniconda3/envs/fucci-seq/lib/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] Biobase_2.38.0 BiocGenerics_0.24.0 knitr_1.17
loaded via a namespace (and not attached):
[1] Rcpp_0.12.14 digest_0.6.12 rprojroot_1.2 backports_1.0.5
[5] git2r_0.19.0 magrittr_1.5 evaluate_0.10.1 highr_0.6
[9] stringi_1.1.2 rmarkdown_1.8 tools_3.4.1 stringr_1.2.0
[13] yaml_2.1.16 compiler_3.4.1 htmltools_0.3.6
This R Markdown site was created with workflowr