Last updated: 2018-02-01
Code version: 0f846a1
Here’s the list for which a given gene symbol corresponds to multiple Ensembl gene ID in the data.
I learned that there are some regions on the genome that show substantial variability in the population, and subsequently can have multiple representations (sequences). These regions are known as “alternate loci”.
I got some of these info from this paper https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5155401/. According to this paper, the GRCh38 patch assembly has a total of 178 alternate-locus-containing regions associated with a total of 261 alternate loci.
For example, TAF9 has two alternative loci which correspond to two different Ensembl ID, and you can see the location fo these two here on this page http://useast.ensembl.org/Homo_sapiens/Gene/Alleles?db=core;g=ENSG00000273841;r=5:69364743-69370013. Click on “View alleles of this gene on alternate assemblies”.
Another example is TUBB which corresponds to 8 Ensembl IDs. It is located in MHC region which is known to have 8 alternative loci (corresponding to 8 different cell lines; https://www.ncbi.nlm.nih.gov/grc/human/regions/MHC). So there’s no surprise that TUBB has 8 Ensembl ID.
load cell cycle genes from the dropseq paper
cellcycle <- readRDS("../data/cellcycle-genes-previous-studies/rds/macosko-2017.rds")
dup <- which(duplicated(cellcycle[,-3]))
cellcycle[cellcycle$hgnc %in% unique(cellcycle$hgnc[dup]),]
hgnc phase ensembl
81 CCDC84 S ENSG00000280975
82 CCDC84 S ENSG00000186166
114 CDK7 M/G1 ENSG00000134058
115 CDK7 M/G1 ENSG00000277273
132 CFD G2 ENSG00000197766
133 CFD G2 ENSG00000274619
219 FAM189B M/G1 ENSG00000262666
220 FAM189B M/G1 ENSG00000160767
224 FAN1 G2 ENSG00000198690
225 FAN1 G2 ENSG00000276787
231 FOPNL M/G1 ENSG00000276914
232 FOPNL M/G1 ENSG00000133393
289 HRAS G1/S ENSG00000276536
290 HRAS G1/S ENSG00000174775
315 KIAA1147 G1/S ENSG00000257093
316 KIAA1147 G1/S ENSG00000262599
335 KIFC1 G2 ENSG00000204197
336 KIFC1 G2 ENSG00000237649
337 KIFC1 G2 ENSG00000056678
338 KIFC1 G2 ENSG00000233450
373 MDC1 M ENSG00000228575
374 MDC1 M ENSG00000137337
375 MDC1 M ENSG00000225589
376 MDC1 M ENSG00000206481
377 MDC1 M ENSG00000224587
378 MDC1 M ENSG00000234012
379 MDC1 M ENSG00000231135
380 MDC1 M ENSG00000237095
397 MRPS18B M/G1 ENSG00000223775
398 MRPS18B M/G1 ENSG00000226111
399 MRPS18B M/G1 ENSG00000229861
400 MRPS18B M/G1 ENSG00000204568
401 MRPS18B M/G1 ENSG00000203624
402 MRPS18B M/G1 ENSG00000233813
403 MRPS18B M/G1 ENSG00000227420
484 PPP1R10 M ENSG00000238104
485 PPP1R10 M ENSG00000227804
486 PPP1R10 M ENSG00000204569
487 PPP1R10 M ENSG00000230995
488 PPP1R10 M ENSG00000235291
489 PPP1R10 M ENSG00000206489
490 PPP1R10 M ENSG00000231737
558 SMARCB1 M ENSG00000099956
559 SMARCB1 M ENSG00000275837
583 TAF15 G1/S ENSG00000276833
584 TAF15 G1/S ENSG00000270647
585 TAF9 M/G1 ENSG00000273841
586 TAF9 M/G1 ENSG00000276463
624 TUBB G2 ENSG00000232421
625 TUBB G2 ENSG00000224156
626 TUBB G2 ENSG00000235067
627 TUBB G2 ENSG00000183311
628 TUBB G2 ENSG00000229684
629 TUBB G2 ENSG00000227739
630 TUBB G2 ENSG00000196230
631 TUBB G2 ENSG00000232575
646 UBR7 G1/S ENSG00000012963
647 UBR7 G1/S ENSG00000278787
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.4 (Nitrogen)
Matrix products: default
BLAS: /home/joycehsiao/miniconda3/envs/fucci-seq/lib/R/lib/libRblas.so
LAPACK: /home/joycehsiao/miniconda3/envs/fucci-seq/lib/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.4.1 backports_1.0.5 magrittr_1.5 rprojroot_1.2
[5] tools_3.4.1 htmltools_0.3.6 yaml_2.1.16 Rcpp_0.12.14
[9] stringi_1.1.2 rmarkdown_1.8 knitr_1.17 git2r_0.19.0
[13] stringr_1.2.0 digest_0.6.12 evaluate_0.10.1
This R Markdown site was created with workflowr