Supervised method learning: finalizing on data holiding out validation samples
- Perform model training on cell times computed in different ways
- Based on Fucci only
- Based on Fucci and DAPI and derived from trigonometric transformation
- Based on FUCCI and DAPI and derived from an algebraic approach
- Compare model training results
- Compute and compare cell time estimates derived under two different assumptions: assuming equal variance in PC1 and PC2 (circle) vs. unequal variances in PC1 and PC2.
- Prediction error for cells from mixed individuals: Prediction error smaller when based on fucci time only than based on fucci and dapi time, though only a very small difference. However, the genes selected in these two are similar, both found to have 37 genes enriched for the Cell Cycle GO (0007049).
- Compiling results
Supervised method learning: building the analysis approach
Approaches to fitting cyclical trend in gene expression data
Cell cycle signal in gene expression data
- We investigated cell cycle signals in the sequencing data alone.
- We then assign categorical labels of cell cycle and explored the expresson profiles of these categories.
- We ordered cells on a circle using FUCCI intensities alone.
- I used nonparametric methods to identify genes that may be cyclical along cell cycle phases.
- Fit smash and kernel regression on circular variables on a subset of genes with detection rate > .8.
- Fit trendfilter on a subset of genes (5) that are observed (visually) to have cyclical pattern. trendfilter is robust to small proportion of undetected cells, approx 2 or 3%. In cases of simulation when increasing proportion of undetected cells to 20%, we observed a flat line in gene expression for genes previously identified to tend to a cyclical pattern.
- Next, we fit trendfilter on all genes after transforming the data to follow standard normal distribution, permutation-based p-values for PVE are used to select 101 significant cyclical genes.
- Additional analysis done to identify top cyclical genes in each individual. The top 5 are not shared across the six individuals. Results
RNA-seq data preprcessing
- The first step in preprocessing RNA-seq data consists of QC and filtering.
- Sample QC and filtering
- Gene QC and filtering
- We then analyzed and corrected for batch effect due to C1 plate in the sequencing data
Microscopy image analysis
We evaluated and pre-processed the results of image analysis as follows:
- We visually inspect images deteced to have none or more than one nucleus. For cases that are inconsistent with visual inspection, we correct the number of nuclei detected.
- We applied background correction to the intensity measurements of GFP, RFP and DAPI based on the following analyses.
- We analyzed intensity variation across individuals and batches and considers approaches for removing batch effects in the data.
- We investigated the cell time estimates based on FUCCI intensities.
This R Markdown site was created with workflowr