Last updated: 2017-12-13
Code version: 7509725
All processed data are stored as expressionSet
objects in data/eset
. Below shows how to extract information from an expressionSet
object.
\(~\)
library(knitr)
library(Biobase)
\(~\)
To combine all expressionSet
objects in the folder,
fname <- Sys.glob("../data/eset/*.rds")
eset <- Reduce(combine, Map(readRDS, fname))
\(~\)
To view the sample metadata labels,
kable(varMetadata(phenoData(eset)))
labelDescription | |
---|---|
experiment | ID of C1 chip (i.e. processing date in YYYYMMDD) |
well | Well of C1 chip (96 total, rows A-H, cols 1-12) |
cell_number | The number of cells observed in the well via microscopy |
concentration | The cDNA concentration of the well prior to library prep |
ERCC | The dilution factor of the ERCC spike-ins |
individual.1 | Individual # 1 included on this C1 chip |
individual.2 | Individual # 2 included on this C1 chip |
image_individual | The chip label for the image files |
image_label | The well label for the image files |
raw | The number of raw reads |
umi | The number of reads with a valid UMI |
mapped | The number of reads with a valid UMI that mapped to a genome |
unmapped | The number of reads with a valid UMI that did not map to a genome |
reads_ercc | The number of reads that mapped to the ERCC spike-in transcripts |
reads_hs | The number of reads that mapped to the H. sapiens genome |
reads_egfp | The number of reads that mapped to the FUCCI EGFP transgene |
reads_mcherry | The number of reads that mapped to the FUCCI mCherry transgene |
molecules | The number of molecules (i.e. post UMI-deduplication) |
mol_ercc | The number of molecules that mapped to the ERCC spike-in transcripts |
mol_hs | The number of molecules that mapped to the H. sapiens genome |
mol_egfp | The number of molecules that mapped to the FUCCI EGFP transgene |
mol_mcherry | The number of molecules that mapped to the FUCCI mCherry transgene |
detect_ercc | The number of ERCC genes with at least one molecule |
detect_hs | The number of H. sapiens genes with at least one molecule |
chip_id | verifyBamID: The predicted individual based on the sequencing data |
chipmix | verifyBamID: chipmix is a metric for detecting sample swaps |
freemix | verifyBamID: freemix is a measure of contamination. 0 == good & 0.5 == bad |
snps | verifyBamID: The number of SNPs that passed thresholds for AF and missingness |
reads | verifyBamID: The number of sequences that overlapped SNPs |
avg_dp | verifyBamID: The average sequencing depth that covered a SNP |
min_dp | verifyBamID: A minimun depth threshold for QC only (affects snps_w_min) |
snps_w_min | verifyBamID: The number of SNPs that had the minimum depth (min_dp); QC only |
valid_id | verifyBamID: Is the predicted individual 1 of the 2 added to the C1 chip? |
\(~\)
To view the feature (gene) metadata labels,
kable(varMetadata(featureData(eset)))
labelDescription | |
---|---|
chr | Chromosome |
start | Most 5’ start position (GRCh37/hg19; 1-based; inclusive) |
end | Most 3’ end position (GRCh37/hg19; 1-based; inclusive) |
name | Gene name |
strand | Strand (+ = positive/forward; - = negative/reverse) |
source | Source of RNA |
\(~\)
To extract count data,
exprs(eset)
There are 20,421 genes and 1,536 single cell samples in the raw data.
dim(exprs(eset))
[1] 20421 1536
\(~\)
To extract feature/gene information,
fData(eset)
The features include FUCCI transgenes (EGFP and mCherry), ERCC spike-in controls, and endogenoeus genes (ENSG).
head(fData(eset))
chr start end name strand source
EGFP EGFP 1 714 EGFP + EGFP
ENSG00000000003 hsX 99883667 99894988 TSPAN6 - H. sapiens
ENSG00000000005 hsX 99839799 99854882 TNMD + H. sapiens
ENSG00000000419 hs20 49551404 49575092 DPM1 - H. sapiens
ENSG00000000457 hs1 169818772 169863408 SCYL3 - H. sapiens
ENSG00000000460 hs1 169631245 169823221 C1orf112 + H. sapiens
\(~\)
To extract sample information,
pData(eset)
The rows contain single cell samples. Row names indicate the experiment date (we had one C1 plate per day), and C1 well ID.
head(pData(eset))
experiment well cell_number concentration ERCC
20170905-A01 20170905 A01 1 1.7264044 50x dilution
20170905-A02 20170905 A02 1 1.4456926 50x dilution
20170905-A03 20170905 A03 1 1.8896170 50x dilution
20170905-A04 20170905 A04 1 0.4753723 50x dilution
20170905-A05 20170905 A05 1 0.5596827 50x dilution
20170905-A06 20170905 A06 1 2.1353518 50x dilution
individual.1 individual.2 image_individual image_label
20170905-A01 NA18855 NA18870 18870_18855 3
20170905-A02 NA18855 NA18870 18870_18855 2
20170905-A03 NA18855 NA18870 18870_18855 1
20170905-A04 NA18855 NA18870 18870_18855 49
20170905-A05 NA18855 NA18870 18870_18855 50
20170905-A06 NA18855 NA18870 18870_18855 51
raw umi mapped unmapped reads_ercc reads_hs
20170905-A01 2734844 1754078 1240443 513635 76433 1163764
20170905-A02 1910671 1254676 861713 392963 120589 740940
20170905-A03 2284182 1571727 1093848 477879 124186 968934
20170905-A04 920518 610742 382426 228316 116552 265873
20170905-A05 1260569 831378 494618 336760 117843 376703
20170905-A06 2501607 1733479 1258695 474784 99172 1158976
reads_egfp reads_mcherry molecules mol_ercc mol_hs mol_egfp
20170905-A01 246 0 113178 3122 110041 15
20170905-A02 182 2 59545 3143 56390 10
20170905-A03 727 1 74459 3307 71119 32
20170905-A04 1 0 29385 3110 26274 1
20170905-A05 71 1 42407 3059 39343 4
20170905-A06 546 1 94362 3120 91215 26
mol_mcherry detect_ercc detect_hs chip_id chipmix freemix
20170905-A01 0 39 8390 NA18870 0.12414 0.08025
20170905-A02 2 45 6057 NA18870 0.19067 0.10145
20170905-A03 1 40 6429 NA18855 0.21403 0.08767
20170905-A04 0 42 2746 NA18870 0.45097 0.08319
20170905-A05 1 44 3633 NA18870 0.44775 0.22013
20170905-A06 1 41 7508 NA18870 0.15767 0.11801
snps reads avg_dp min_dp snps_w_min valid_id
20170905-A01 311848 7959 0.03 1 3503 TRUE
20170905-A02 311848 3651 0.01 1 1802 TRUE
20170905-A03 311848 5059 0.02 1 2159 TRUE
20170905-A04 311848 815 0.00 1 591 TRUE
20170905-A05 311848 1209 0.00 1 780 TRUE
20170905-A06 311848 6722 0.02 1 2815 TRUE
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.2 (Nitrogen)
Matrix products: default
BLAS: /home/joycehsiao/miniconda3/envs/fucci-seq/lib/R/lib/libRblas.so
LAPACK: /home/joycehsiao/miniconda3/envs/fucci-seq/lib/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] Biobase_2.38.0 BiocGenerics_0.24.0 knitr_1.16
loaded via a namespace (and not attached):
[1] Rcpp_0.12.14 digest_0.6.12 rprojroot_1.2 backports_1.0.5
[5] git2r_0.19.0 magrittr_1.5 evaluate_0.10.1 highr_0.6
[9] stringi_1.1.2 rmarkdown_1.6 tools_3.4.1 stringr_1.2.0
[13] yaml_2.1.14 compiler_3.4.1 htmltools_0.3.6
This R Markdown site was created with workflowr