Created
December 20, 2016 20:52
-
-
Save lcolladotor/374a0de6be5c202bbf216295989e534a to your computer and use it in GitHub Desktop.
Issue with POU5F1 gene at recount
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library('TxDb.Hsapiens.UCSC.hg38.knownGene') | |
library('GenomicFeatures') | |
library('org.Hs.eg.db') | |
library('EnsDb.Hsapiens.v79') | |
library('recount') | |
library('devtools') | |
## Get the genes from UCSC hg38 as used in recount | |
genes <- genes(TxDb.Hsapiens.UCSC.hg38.knownGene::TxDb.Hsapiens.UCSC.hg38.knownGene) | |
## Get the genes for Ensembl v79 | |
genes_ens <- genes(EnsDb.Hsapiens.v79::EnsDb.Hsapiens.v79) | |
## Find the entrez and ensembl ids for POU5F1 | |
select(org.Hs.eg.db, keys = 'POU5F1', columns = c('SYMBOL', 'ENTREZID', 'ENSEMBL'), keytype = 'SYMBOL') | |
## Indeed, POU5F1 is not present in UCSC hg38 knownGene | |
genes['5460'] | |
## However ENSG00000204531 (the id shown at http://www.genecards.org/cgi-bin/carddisp.pl?gene=POU5F1) is | |
genes_ens['ENSG00000204531'] | |
## The other ensembl ids are present too | |
genes_ens[select(org.Hs.eg.db, keys = 'POU5F1', columns = 'ENSEMBL', keytype = 'SYMBOL')$ENSEMBL] | |
## The following code is modified from http://bioconductor.org/packages/release/bioc/vignettes/recount/inst/doc/recount-quickstart.html#using-anothernewer-annotation | |
## Get the reduced exons based on EnsDb.Hsapiens.v79 which matches hg38 | |
exons <- reproduce_ranges('exon', db = 'EnsDb.Hsapiens.v79') | |
## Change the chromosome names to match those used in the BigWig files | |
library('GenomeInfoDb') | |
seqlevelsStyle(exons) <- 'UCSC' | |
## Get the count matrix for POU5F1 with ENSEMBL id ENSG00000204531 | |
## (this code can be modified for other ENSEMBL ids) | |
exons_POU5F1 <- exons[['ENSG00000204531']] | |
exonMatrix <- coverage_matrix('SRP051472', 'chr6', exons_POU5F1) | |
dim(exonMatrix) | |
## Reproducibility info | |
options(width = 120) | |
session_info() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
> library('TxDb.Hsapiens.UCSC.hg38.knownGene') | |
> library('GenomicFeatures') | |
> library('org.Hs.eg.db') | |
> library('EnsDb.Hsapiens.v79') | |
> library('recount') | |
> library('devtools') | |
> | |
> ## Get the genes from UCSC hg38 as used in recount | |
> genes <- genes(TxDb.Hsapiens.UCSC.hg38.knownGene::TxDb.Hsapiens.UCSC.hg38.knownGene) | |
> | |
> ## Get the genes for Ensembl v79 | |
> genes_ens <- genes(EnsDb.Hsapiens.v79::EnsDb.Hsapiens.v79) | |
> | |
> ## Find the entrez and ensembl ids for POU5F1 | |
> select(org.Hs.eg.db, keys = 'POU5F1', columns = c('SYMBOL', 'ENTREZID', 'ENSEMBL'), keytype = 'SYMBOL') | |
'select()' returned 1:many mapping between keys and columns | |
SYMBOL ENTREZID ENSEMBL | |
1 POU5F1 5460 ENSG00000204531 | |
2 POU5F1 5460 ENSG00000230336 | |
3 POU5F1 5460 ENSG00000229094 | |
4 POU5F1 5460 ENSG00000235068 | |
5 POU5F1 5460 ENSG00000237582 | |
6 POU5F1 5460 ENSG00000206454 | |
7 POU5F1 5460 ENSG00000233911 | |
> | |
> ## Indeed, POU5F1 is not present in UCSC hg38 knownGene | |
> genes['5460'] | |
Error: subscript contains invalid names | |
> | |
> ## However ENSG00000204531 (the id shown at http://www.genecards.org/cgi-bin/carddisp.pl?gene=POU5F1) is | |
> genes_ens['ENSG00000204531'] | |
GRanges object with 1 range and 6 metadata columns: | |
seqnames ranges strand | gene_id gene_name entrezid gene_biotype | |
<Rle> <IRanges> <Rle> | <character> <character> <character> <character> | |
ENSG00000204531 6 [31164337, 31180731] - | ENSG00000204531 POU5F1 5460 protein_coding | |
seq_coord_system symbol | |
<character> <character> | |
ENSG00000204531 chromosome POU5F1 | |
------- | |
seqinfo: 319 sequences from GRCh38 genome | |
> | |
> ## The other ensembl ids are present too | |
> genes_ens[select(org.Hs.eg.db, keys = 'POU5F1', columns = 'ENSEMBL', keytype = 'SYMBOL')$ENSEMBL] | |
'select()' returned 1:many mapping between keys and columns | |
GRanges object with 7 ranges and 6 metadata columns: | |
seqnames ranges strand | gene_id gene_name entrezid | |
<Rle> <IRanges> <Rle> | <character> <character> <character> | |
ENSG00000204531 6 [31164337, 31180731] - | ENSG00000204531 POU5F1 5460 | |
ENSG00000230336 CHR_HSCHR6_MHC_MANN_CTG1 [31209245, 31215620] - | ENSG00000230336 POU5F1 5460 | |
ENSG00000229094 CHR_HSCHR6_MHC_DBB_CTG1 [31158060, 31164434] - | ENSG00000229094 POU5F1 5460 | |
ENSG00000235068 CHR_HSCHR6_MHC_MCF_CTG1 [31242859, 31249234] - | ENSG00000235068 POU5F1 5460 | |
ENSG00000237582 CHR_HSCHR6_MHC_SSTO_CTG1 [31159218, 31165590] - | ENSG00000237582 POU5F1 5460 | |
ENSG00000206454 CHR_HSCHR6_MHC_QBL_CTG1 [31156777, 31163150] - | ENSG00000206454 POU5F1 5460 | |
ENSG00000233911 CHR_HSCHR6_MHC_COX_CTG1 [31156880, 31163254] - | ENSG00000233911 POU5F1 5460 | |
gene_biotype seq_coord_system symbol | |
<character> <character> <character> | |
ENSG00000204531 protein_coding chromosome POU5F1 | |
ENSG00000230336 protein_coding chromosome POU5F1 | |
ENSG00000229094 protein_coding chromosome POU5F1 | |
ENSG00000235068 protein_coding chromosome POU5F1 | |
ENSG00000237582 protein_coding chromosome POU5F1 | |
ENSG00000206454 protein_coding chromosome POU5F1 | |
ENSG00000233911 protein_coding chromosome POU5F1 | |
------- | |
seqinfo: 319 sequences from GRCh38 genome | |
> | |
> ## The following code is modified from http://bioconductor.org/packages/release/bioc/vignettes/recount/inst/doc/recount-quickstart.html#using-anothernewer-annotation | |
> | |
> ## Get the reduced exons based on EnsDb.Hsapiens.v79 which matches hg38 | |
> exons <- reproduce_ranges('exon', db = 'EnsDb.Hsapiens.v79') | |
> | |
> ## Change the chromosome names to match those used in the BigWig files | |
> library('GenomeInfoDb') | |
> seqlevelsStyle(exons) <- 'UCSC' | |
> | |
> ## Get the count matrix for POU5F1 with ENSEMBL id ENSG00000204531 | |
> ## (this code can be modified for other ENSEMBL ids) | |
> exons_POU5F1 <- exons[['ENSG00000204531']] | |
> exonMatrix <- coverage_matrix('SRP051472', 'chr6', exons_POU5F1) | |
2016-12-20 15:46:29 railMatrix: processing regions 1 to 5 | |
2016-12-20 15:46:29 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732018.bw | |
2016-12-20 15:46:29 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732019.bw | |
2016-12-20 15:46:30 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732020.bw | |
2016-12-20 15:46:31 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732021.bw | |
2016-12-20 15:46:31 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732022.bw | |
2016-12-20 15:46:32 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732023.bw | |
2016-12-20 15:46:33 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732024.bw | |
2016-12-20 15:46:33 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732025.bw | |
2016-12-20 15:46:34 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732026.bw | |
2016-12-20 15:46:35 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732027.bw | |
2016-12-20 15:46:35 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732028.bw | |
2016-12-20 15:46:36 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732029.bw | |
2016-12-20 15:46:36 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732030.bw | |
2016-12-20 15:46:37 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732031.bw | |
2016-12-20 15:46:37 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732032.bw | |
2016-12-20 15:46:38 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732033.bw | |
2016-12-20 15:46:39 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732034.bw | |
2016-12-20 15:46:40 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732035.bw | |
2016-12-20 15:46:40 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732036.bw | |
2016-12-20 15:46:41 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732037.bw | |
2016-12-20 15:46:42 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732038.bw | |
2016-12-20 15:46:42 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732039.bw | |
2016-12-20 15:46:43 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732040.bw | |
2016-12-20 15:46:43 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732041.bw | |
2016-12-20 15:46:44 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732042.bw | |
2016-12-20 15:46:44 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732043.bw | |
2016-12-20 15:46:45 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732044.bw | |
2016-12-20 15:46:46 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732045.bw | |
2016-12-20 15:46:46 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732046.bw | |
2016-12-20 15:46:47 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732047.bw | |
2016-12-20 15:46:47 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732048.bw | |
2016-12-20 15:46:49 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732049.bw | |
2016-12-20 15:46:50 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732050.bw | |
2016-12-20 15:46:50 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732051.bw | |
2016-12-20 15:46:51 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732052.bw | |
2016-12-20 15:46:51 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732053.bw | |
2016-12-20 15:46:52 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732054.bw | |
2016-12-20 15:46:52 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732055.bw | |
2016-12-20 15:46:53 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732056.bw | |
2016-12-20 15:46:54 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732057.bw | |
2016-12-20 15:46:55 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732058.bw | |
2016-12-20 15:46:56 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732059.bw | |
2016-12-20 15:46:57 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732060.bw | |
2016-12-20 15:46:58 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732061.bw | |
2016-12-20 15:46:59 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732062.bw | |
2016-12-20 15:47:00 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732063.bw | |
2016-12-20 15:47:00 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732064.bw | |
2016-12-20 15:47:01 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732065.bw | |
2016-12-20 15:47:02 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732066.bw | |
2016-12-20 15:47:03 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732067.bw | |
2016-12-20 15:47:04 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732068.bw | |
2016-12-20 15:47:05 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732069.bw | |
2016-12-20 15:47:05 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732070.bw | |
2016-12-20 15:47:06 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732071.bw | |
2016-12-20 15:47:07 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732072.bw | |
2016-12-20 15:47:08 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732073.bw | |
2016-12-20 15:47:09 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732074.bw | |
2016-12-20 15:47:10 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732075.bw | |
2016-12-20 15:47:11 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732076.bw | |
2016-12-20 15:47:12 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732077.bw | |
2016-12-20 15:47:12 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732078.bw | |
2016-12-20 15:47:13 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732079.bw | |
2016-12-20 15:47:14 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732080.bw | |
2016-12-20 15:47:15 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732082.bw | |
2016-12-20 15:47:16 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732081.bw | |
2016-12-20 15:47:17 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732083.bw | |
2016-12-20 15:47:18 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732084.bw | |
2016-12-20 15:47:19 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732085.bw | |
2016-12-20 15:47:19 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732086.bw | |
2016-12-20 15:47:20 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732087.bw | |
2016-12-20 15:47:21 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732088.bw | |
There were 50 or more warnings (use warnings() to see the first 50) | |
> dim(exonMatrix) | |
[1] 5 71 | |
> | |
> ## Reproducibility info | |
> options(width = 120) | |
> session_info() | |
Session info ----------------------------------------------------------------------------------------------------------- | |
setting value | |
version R Under development (unstable) (2016-10-26 r71594) | |
system x86_64, darwin13.4.0 | |
ui AQUA | |
language (EN) | |
collate en_US.UTF-8 | |
tz America/New_York | |
date 2016-12-20 | |
Packages --------------------------------------------------------------------------------------------------------------- | |
package * version date source | |
acepack 1.4.1 2016-10-29 CRAN (R 3.4.0) | |
AnnotationDbi * 1.37.0 2016-10-26 Bioconductor | |
AnnotationHub 2.7.6 2016-11-19 Bioconductor | |
assertthat 0.1 2013-12-06 CRAN (R 3.4.0) | |
Biobase * 2.35.0 2016-10-23 Bioconductor | |
BiocGenerics * 0.21.1 2016-12-01 Bioconductor | |
BiocInstaller 1.25.2 2016-10-25 Bioconductor | |
BiocParallel 1.9.2 2016-11-18 Bioconductor | |
biomaRt 2.31.3 2016-12-01 Bioconductor | |
Biostrings 2.43.1 2016-11-17 Bioconductor | |
bitops 1.0-6 2013-08-17 CRAN (R 3.4.0) | |
BSgenome 1.43.1 2016-11-11 Bioconductor | |
bumphunter 1.15.0 2016-10-23 Bioconductor | |
cluster 2.0.5 2016-10-08 CRAN (R 3.4.0) | |
codetools 0.2-15 2016-10-05 CRAN (R 3.4.0) | |
colorspace 1.3-1 2016-11-18 CRAN (R 3.4.0) | |
data.table 1.10.0 2016-12-03 CRAN (R 3.4.0) | |
DBI 0.5-1 2016-09-10 CRAN (R 3.4.0) | |
derfinder 1.9.5 2016-11-30 Bioconductor | |
derfinderHelper 1.9.3 2016-11-29 Bioconductor | |
devtools * 1.12.0 2016-06-24 CRAN (R 3.4.0) | |
digest 0.6.10 2016-08-02 CRAN (R 3.4.0) | |
doRNG 1.6 2014-03-07 CRAN (R 3.4.0) | |
downloader 0.4 2015-07-09 CRAN (R 3.4.0) | |
EnsDb.Hsapiens.v79 * 2.1.0 2016-11-23 Bioconductor | |
ensembldb * 1.99.7 2016-12-02 Bioconductor | |
foreach 1.4.3 2015-10-13 CRAN (R 3.4.0) | |
foreign 0.8-67 2016-09-13 CRAN (R 3.4.0) | |
Formula 1.2-1 2015-04-07 CRAN (R 3.4.0) | |
GenomeInfoDb * 1.11.6 2016-11-17 Bioconductor | |
GenomicAlignments 1.11.4 2016-12-01 Bioconductor | |
GenomicFeatures * 1.27.4 2016-12-01 Bioconductor | |
GenomicFiles 1.11.3 2016-11-29 Bioconductor | |
GenomicRanges * 1.27.15 2016-12-04 Bioconductor | |
GEOquery 2.41.0 2016-10-25 Bioconductor | |
ggplot2 2.2.0 2016-11-11 CRAN (R 3.4.0) | |
gridExtra 2.2.1 2016-02-29 CRAN (R 3.4.0) | |
gtable 0.2.0 2016-02-26 CRAN (R 3.4.0) | |
Hmisc 4.0-0 2016-11-01 CRAN (R 3.4.0) | |
htmlTable 1.7 2016-10-19 CRAN (R 3.4.0) | |
htmltools 0.3.5 2016-03-21 CRAN (R 3.4.0) | |
httpuv 1.3.3 2015-08-04 CRAN (R 3.4.0) | |
httr 1.2.1 2016-07-03 CRAN (R 3.4.0) | |
interactiveDisplayBase 1.13.0 2016-10-23 Bioconductor | |
IRanges * 2.9.13 2016-12-01 Bioconductor | |
iterators 1.0.8 2015-10-13 CRAN (R 3.4.0) | |
jsonlite 1.1 2016-09-14 CRAN (R 3.4.0) | |
knitr 1.15.1 2016-11-22 CRAN (R 3.4.0) | |
lattice 0.20-34 2016-09-06 CRAN (R 3.4.0) | |
latticeExtra 0.6-28 2016-02-09 CRAN (R 3.4.0) | |
lazyeval 0.2.0 2016-06-12 CRAN (R 3.4.0) | |
locfit 1.5-9.1 2013-04-20 CRAN (R 3.4.0) | |
magrittr 1.5 2014-11-22 CRAN (R 3.4.0) | |
Matrix 1.2-7.1 2016-09-01 CRAN (R 3.4.0) | |
matrixStats 0.51.0 2016-10-09 CRAN (R 3.4.0) | |
memoise 1.0.0 2016-01-29 CRAN (R 3.4.0) | |
mime 0.5 2016-07-07 CRAN (R 3.4.0) | |
munsell 0.4.3 2016-02-13 CRAN (R 3.4.0) | |
nnet 7.3-12 2016-02-02 CRAN (R 3.4.0) | |
org.Hs.eg.db * 3.4.0 2016-11-15 Bioconductor | |
pkgmaker 0.22 2014-05-14 CRAN (R 3.4.0) | |
plyr 1.8.4 2016-06-08 CRAN (R 3.4.0) | |
ProtGenerics 1.7.0 2016-10-23 Bioconductor | |
qvalue 2.7.0 2016-10-23 Bioconductor | |
R6 2.2.0 2016-10-05 CRAN (R 3.4.0) | |
RColorBrewer 1.1-2 2014-12-07 CRAN (R 3.4.0) | |
Rcpp 0.12.8 2016-11-17 CRAN (R 3.4.0) | |
RCurl 1.95-4.8 2016-03-01 CRAN (R 3.4.0) | |
recount * 1.1.7 2016-11-29 Bioconductor | |
registry 0.3 2015-07-08 CRAN (R 3.4.0) | |
rentrez 1.0.4 2016-10-26 CRAN (R 3.4.0) | |
reshape2 1.4.2 2016-10-22 CRAN (R 3.4.0) | |
rngtools 1.2.4 2014-03-06 CRAN (R 3.4.0) | |
rpart 4.1-10 2015-06-29 CRAN (R 3.4.0) | |
Rsamtools 1.27.5 2016-12-01 Bioconductor | |
RSQLite 1.1 2016-11-27 CRAN (R 3.4.0) | |
rstudioapi 0.6 2016-06-27 CRAN (R 3.4.0) | |
rtracklayer 1.35.1 2016-10-29 Bioconductor | |
S4Vectors * 0.13.5 2016-12-01 Bioconductor | |
scales 0.4.1 2016-11-09 CRAN (R 3.4.0) | |
shiny 0.14.2 2016-11-01 CRAN (R 3.4.0) | |
stringi 1.1.2 2016-10-01 CRAN (R 3.4.0) | |
stringr 1.1.0 2016-08-19 CRAN (R 3.4.0) | |
SummarizedExperiment * 1.5.3 2016-11-11 Bioconductor | |
survival 2.40-1 2016-10-30 CRAN (R 3.4.0) | |
tibble 1.2 2016-08-26 CRAN (R 3.4.0) | |
TxDb.Hsapiens.UCSC.hg38.knownGene * 3.4.0 2016-11-15 Bioconductor | |
VariantAnnotation 1.21.10 2016-12-01 Bioconductor | |
withr 1.0.2 2016-06-20 CRAN (R 3.4.0) | |
XML 3.98-1.5 2016-11-10 CRAN (R 3.4.0) | |
xtable 1.8-2 2016-02-05 CRAN (R 3.4.0) | |
XVector 0.15.0 2016-10-23 Bioconductor | |
yaml 2.1.14 2016-11-12 CRAN (R 3.4.0) | |
zlibbioc 1.21.0 2016-10-23 Bioconductor | |
> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks. I know this isn't your work, but I think there is something wrong with the TxDb.Hsapiens.UCSC.hg38.knownGene package. If you go to the UCSC genome browser and choose HG38, POU5F1 is there as a known gene. POU5F1 is an extremely important gene. Anyone interested in embryonic stem cells or even any cell derived from an embryonic stem cell will need to know about this gene. It makes me a bit concerned that other important genes might be missing. As I am just learning R it is rather daunting to think about performing the exon summary described above on all the experiments I want to look at (which is dynamic, so any experiment within recount). Plus I'm concerned that other genes are missing, so I don't really want to do it as a one-off. It might be worth your time to do something like get a download of hg38 directly form UCSC and compare this to the gene list that the TxDb.Hsapiens.UCSC.hg38.knownGene package produces. I have been looking around for ways to access public data programmatically, and recount seems like the best option, but if the gene set is incomplete, it is really is problematic.
Like I said, recount seems like the best option. It would be great if the gene set could be in line with what UCSC HG38 actually has.
I don't mean to make extra work for you, but I'm concerned that I don't really have the skills to do what probably needs to be done.
Do you have plans for a recount3?