Data overview

Last updated: 2017-12-13

Code version: 7509725

All processed data are stored as expressionSet objects in data/eset. Below shows how to extract information from an expressionSet object.

\(~\)

library(knitr)
library(Biobase)

\(~\)

To combine all expressionSet objects in the folder,

fname <- Sys.glob("../data/eset/*.rds")
eset <- Reduce(combine, Map(readRDS, fname))

\(~\)

To view the sample metadata labels,

kable(varMetadata(phenoData(eset)))

	labelDescription
experiment	ID of C1 chip (i.e. processing date in YYYYMMDD)
well	Well of C1 chip (96 total, rows A-H, cols 1-12)
cell_number	The number of cells observed in the well via microscopy
concentration	The cDNA concentration of the well prior to library prep
ERCC	The dilution factor of the ERCC spike-ins
individual.1	Individual # 1 included on this C1 chip
individual.2	Individual # 2 included on this C1 chip
image_individual	The chip label for the image files
image_label	The well label for the image files
raw	The number of raw reads
umi	The number of reads with a valid UMI
mapped	The number of reads with a valid UMI that mapped to a genome
unmapped	The number of reads with a valid UMI that did not map to a genome
reads_ercc	The number of reads that mapped to the ERCC spike-in transcripts
reads_hs	The number of reads that mapped to the H. sapiens genome
reads_egfp	The number of reads that mapped to the FUCCI EGFP transgene
reads_mcherry	The number of reads that mapped to the FUCCI mCherry transgene
molecules	The number of molecules (i.e. post UMI-deduplication)
mol_ercc	The number of molecules that mapped to the ERCC spike-in transcripts
mol_hs	The number of molecules that mapped to the H. sapiens genome
mol_egfp	The number of molecules that mapped to the FUCCI EGFP transgene
mol_mcherry	The number of molecules that mapped to the FUCCI mCherry transgene
detect_ercc	The number of ERCC genes with at least one molecule
detect_hs	The number of H. sapiens genes with at least one molecule
chip_id	verifyBamID: The predicted individual based on the sequencing data
chipmix	verifyBamID: chipmix is a metric for detecting sample swaps
freemix	verifyBamID: freemix is a measure of contamination. 0 == good & 0.5 == bad
snps	verifyBamID: The number of SNPs that passed thresholds for AF and missingness
reads	verifyBamID: The number of sequences that overlapped SNPs
avg_dp	verifyBamID: The average sequencing depth that covered a SNP
min_dp	verifyBamID: A minimun depth threshold for QC only (affects snps_w_min)
snps_w_min	verifyBamID: The number of SNPs that had the minimum depth (min_dp); QC only
valid_id	verifyBamID: Is the predicted individual 1 of the 2 added to the C1 chip?

\(~\)

To view the feature (gene) metadata labels,

kable(varMetadata(featureData(eset)))

	labelDescription
chr	Chromosome
start	Most 5’ start position (GRCh37/hg19; 1-based; inclusive)
end	Most 3’ end position (GRCh37/hg19; 1-based; inclusive)
name	Gene name
strand	Strand (+ = positive/forward; - = negative/reverse)
source	Source of RNA

\(~\)

To extract count data,

exprs(eset)

There are 20,421 genes and 1,536 single cell samples in the raw data.

dim(exprs(eset))

[1] 20421  1536

\(~\)

To extract feature/gene information,

fData(eset)

The features include FUCCI transgenes (EGFP and mCherry), ERCC spike-in controls, and endogenoeus genes (ENSG).

head(fData(eset))

                 chr     start       end     name strand     source
EGFP            EGFP         1       714     EGFP      +       EGFP
ENSG00000000003  hsX  99883667  99894988   TSPAN6      - H. sapiens
ENSG00000000005  hsX  99839799  99854882     TNMD      + H. sapiens
ENSG00000000419 hs20  49551404  49575092     DPM1      - H. sapiens
ENSG00000000457  hs1 169818772 169863408    SCYL3      - H. sapiens
ENSG00000000460  hs1 169631245 169823221 C1orf112      + H. sapiens

\(~\)

To extract sample information,

pData(eset)

The rows contain single cell samples. Row names indicate the experiment date (we had one C1 plate per day), and C1 well ID.

head(pData(eset))

             experiment well cell_number concentration         ERCC
20170905-A01   20170905  A01           1     1.7264044 50x dilution
20170905-A02   20170905  A02           1     1.4456926 50x dilution
20170905-A03   20170905  A03           1     1.8896170 50x dilution
20170905-A04   20170905  A04           1     0.4753723 50x dilution
20170905-A05   20170905  A05           1     0.5596827 50x dilution
20170905-A06   20170905  A06           1     2.1353518 50x dilution
             individual.1 individual.2 image_individual image_label
20170905-A01      NA18855      NA18870      18870_18855           3
20170905-A02      NA18855      NA18870      18870_18855           2
20170905-A03      NA18855      NA18870      18870_18855           1
20170905-A04      NA18855      NA18870      18870_18855          49
20170905-A05      NA18855      NA18870      18870_18855          50
20170905-A06      NA18855      NA18870      18870_18855          51
                 raw     umi  mapped unmapped reads_ercc reads_hs
20170905-A01 2734844 1754078 1240443   513635      76433  1163764
20170905-A02 1910671 1254676  861713   392963     120589   740940
20170905-A03 2284182 1571727 1093848   477879     124186   968934
20170905-A04  920518  610742  382426   228316     116552   265873
20170905-A05 1260569  831378  494618   336760     117843   376703
20170905-A06 2501607 1733479 1258695   474784      99172  1158976
             reads_egfp reads_mcherry molecules mol_ercc mol_hs mol_egfp
20170905-A01        246             0    113178     3122 110041       15
20170905-A02        182             2     59545     3143  56390       10
20170905-A03        727             1     74459     3307  71119       32
20170905-A04          1             0     29385     3110  26274        1
20170905-A05         71             1     42407     3059  39343        4
20170905-A06        546             1     94362     3120  91215       26
             mol_mcherry detect_ercc detect_hs chip_id chipmix freemix
20170905-A01           0          39      8390 NA18870 0.12414 0.08025
20170905-A02           2          45      6057 NA18870 0.19067 0.10145
20170905-A03           1          40      6429 NA18855 0.21403 0.08767
20170905-A04           0          42      2746 NA18870 0.45097 0.08319
20170905-A05           1          44      3633 NA18870 0.44775 0.22013
20170905-A06           1          41      7508 NA18870 0.15767 0.11801
               snps reads avg_dp min_dp snps_w_min valid_id
20170905-A01 311848  7959   0.03      1       3503     TRUE
20170905-A02 311848  3651   0.01      1       1802     TRUE
20170905-A03 311848  5059   0.02      1       2159     TRUE
20170905-A04 311848   815   0.00      1        591     TRUE
20170905-A05 311848  1209   0.00      1        780     TRUE
20170905-A06 311848  6722   0.02      1       2815     TRUE

Session information

R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.2 (Nitrogen)

Matrix products: default
BLAS: /home/joycehsiao/miniconda3/envs/fucci-seq/lib/R/lib/libRblas.so
LAPACK: /home/joycehsiao/miniconda3/envs/fucci-seq/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] Biobase_2.38.0      BiocGenerics_0.24.0 knitr_1.16         

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.14    digest_0.6.12   rprojroot_1.2   backports_1.0.5
 [5] git2r_0.19.0    magrittr_1.5    evaluate_0.10.1 highr_0.6      
 [9] stringi_1.1.2   rmarkdown_1.6   tools_3.4.1     stringr_1.2.0  
[13] yaml_2.1.14     compiler_3.4.1  htmltools_0.3.6

This R Markdown site was created with workflowr

Data overview

Joyce Hsiao

Session information