Skip to content

Instantly share code, notes, and snippets.

@lcolladotor
Created December 20, 2016 20:52
Show Gist options
  • Save lcolladotor/374a0de6be5c202bbf216295989e534a to your computer and use it in GitHub Desktop.
Save lcolladotor/374a0de6be5c202bbf216295989e534a to your computer and use it in GitHub Desktop.
Issue with POU5F1 gene at recount
library('TxDb.Hsapiens.UCSC.hg38.knownGene')
library('GenomicFeatures')
library('org.Hs.eg.db')
library('EnsDb.Hsapiens.v79')
library('recount')
library('devtools')
## Get the genes from UCSC hg38 as used in recount
genes <- genes(TxDb.Hsapiens.UCSC.hg38.knownGene::TxDb.Hsapiens.UCSC.hg38.knownGene)
## Get the genes for Ensembl v79
genes_ens <- genes(EnsDb.Hsapiens.v79::EnsDb.Hsapiens.v79)
## Find the entrez and ensembl ids for POU5F1
select(org.Hs.eg.db, keys = 'POU5F1', columns = c('SYMBOL', 'ENTREZID', 'ENSEMBL'), keytype = 'SYMBOL')
## Indeed, POU5F1 is not present in UCSC hg38 knownGene
genes['5460']
## However ENSG00000204531 (the id shown at http://www.genecards.org/cgi-bin/carddisp.pl?gene=POU5F1) is
genes_ens['ENSG00000204531']
## The other ensembl ids are present too
genes_ens[select(org.Hs.eg.db, keys = 'POU5F1', columns = 'ENSEMBL', keytype = 'SYMBOL')$ENSEMBL]
## The following code is modified from http://bioconductor.org/packages/release/bioc/vignettes/recount/inst/doc/recount-quickstart.html#using-anothernewer-annotation
## Get the reduced exons based on EnsDb.Hsapiens.v79 which matches hg38
exons <- reproduce_ranges('exon', db = 'EnsDb.Hsapiens.v79')
## Change the chromosome names to match those used in the BigWig files
library('GenomeInfoDb')
seqlevelsStyle(exons) <- 'UCSC'
## Get the count matrix for POU5F1 with ENSEMBL id ENSG00000204531
## (this code can be modified for other ENSEMBL ids)
exons_POU5F1 <- exons[['ENSG00000204531']]
exonMatrix <- coverage_matrix('SRP051472', 'chr6', exons_POU5F1)
dim(exonMatrix)
## Reproducibility info
options(width = 120)
session_info()
> library('TxDb.Hsapiens.UCSC.hg38.knownGene')
> library('GenomicFeatures')
> library('org.Hs.eg.db')
> library('EnsDb.Hsapiens.v79')
> library('recount')
> library('devtools')
>
> ## Get the genes from UCSC hg38 as used in recount
> genes <- genes(TxDb.Hsapiens.UCSC.hg38.knownGene::TxDb.Hsapiens.UCSC.hg38.knownGene)
>
> ## Get the genes for Ensembl v79
> genes_ens <- genes(EnsDb.Hsapiens.v79::EnsDb.Hsapiens.v79)
>
> ## Find the entrez and ensembl ids for POU5F1
> select(org.Hs.eg.db, keys = 'POU5F1', columns = c('SYMBOL', 'ENTREZID', 'ENSEMBL'), keytype = 'SYMBOL')
'select()' returned 1:many mapping between keys and columns
SYMBOL ENTREZID ENSEMBL
1 POU5F1 5460 ENSG00000204531
2 POU5F1 5460 ENSG00000230336
3 POU5F1 5460 ENSG00000229094
4 POU5F1 5460 ENSG00000235068
5 POU5F1 5460 ENSG00000237582
6 POU5F1 5460 ENSG00000206454
7 POU5F1 5460 ENSG00000233911
>
> ## Indeed, POU5F1 is not present in UCSC hg38 knownGene
> genes['5460']
Error: subscript contains invalid names
>
> ## However ENSG00000204531 (the id shown at http://www.genecards.org/cgi-bin/carddisp.pl?gene=POU5F1) is
> genes_ens['ENSG00000204531']
GRanges object with 1 range and 6 metadata columns:
seqnames ranges strand | gene_id gene_name entrezid gene_biotype
<Rle> <IRanges> <Rle> | <character> <character> <character> <character>
ENSG00000204531 6 [31164337, 31180731] - | ENSG00000204531 POU5F1 5460 protein_coding
seq_coord_system symbol
<character> <character>
ENSG00000204531 chromosome POU5F1
-------
seqinfo: 319 sequences from GRCh38 genome
>
> ## The other ensembl ids are present too
> genes_ens[select(org.Hs.eg.db, keys = 'POU5F1', columns = 'ENSEMBL', keytype = 'SYMBOL')$ENSEMBL]
'select()' returned 1:many mapping between keys and columns
GRanges object with 7 ranges and 6 metadata columns:
seqnames ranges strand | gene_id gene_name entrezid
<Rle> <IRanges> <Rle> | <character> <character> <character>
ENSG00000204531 6 [31164337, 31180731] - | ENSG00000204531 POU5F1 5460
ENSG00000230336 CHR_HSCHR6_MHC_MANN_CTG1 [31209245, 31215620] - | ENSG00000230336 POU5F1 5460
ENSG00000229094 CHR_HSCHR6_MHC_DBB_CTG1 [31158060, 31164434] - | ENSG00000229094 POU5F1 5460
ENSG00000235068 CHR_HSCHR6_MHC_MCF_CTG1 [31242859, 31249234] - | ENSG00000235068 POU5F1 5460
ENSG00000237582 CHR_HSCHR6_MHC_SSTO_CTG1 [31159218, 31165590] - | ENSG00000237582 POU5F1 5460
ENSG00000206454 CHR_HSCHR6_MHC_QBL_CTG1 [31156777, 31163150] - | ENSG00000206454 POU5F1 5460
ENSG00000233911 CHR_HSCHR6_MHC_COX_CTG1 [31156880, 31163254] - | ENSG00000233911 POU5F1 5460
gene_biotype seq_coord_system symbol
<character> <character> <character>
ENSG00000204531 protein_coding chromosome POU5F1
ENSG00000230336 protein_coding chromosome POU5F1
ENSG00000229094 protein_coding chromosome POU5F1
ENSG00000235068 protein_coding chromosome POU5F1
ENSG00000237582 protein_coding chromosome POU5F1
ENSG00000206454 protein_coding chromosome POU5F1
ENSG00000233911 protein_coding chromosome POU5F1
-------
seqinfo: 319 sequences from GRCh38 genome
>
> ## The following code is modified from http://bioconductor.org/packages/release/bioc/vignettes/recount/inst/doc/recount-quickstart.html#using-anothernewer-annotation
>
> ## Get the reduced exons based on EnsDb.Hsapiens.v79 which matches hg38
> exons <- reproduce_ranges('exon', db = 'EnsDb.Hsapiens.v79')
>
> ## Change the chromosome names to match those used in the BigWig files
> library('GenomeInfoDb')
> seqlevelsStyle(exons) <- 'UCSC'
>
> ## Get the count matrix for POU5F1 with ENSEMBL id ENSG00000204531
> ## (this code can be modified for other ENSEMBL ids)
> exons_POU5F1 <- exons[['ENSG00000204531']]
> exonMatrix <- coverage_matrix('SRP051472', 'chr6', exons_POU5F1)
2016-12-20 15:46:29 railMatrix: processing regions 1 to 5
2016-12-20 15:46:29 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732018.bw
2016-12-20 15:46:29 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732019.bw
2016-12-20 15:46:30 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732020.bw
2016-12-20 15:46:31 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732021.bw
2016-12-20 15:46:31 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732022.bw
2016-12-20 15:46:32 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732023.bw
2016-12-20 15:46:33 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732024.bw
2016-12-20 15:46:33 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732025.bw
2016-12-20 15:46:34 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732026.bw
2016-12-20 15:46:35 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732027.bw
2016-12-20 15:46:35 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732028.bw
2016-12-20 15:46:36 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732029.bw
2016-12-20 15:46:36 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732030.bw
2016-12-20 15:46:37 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732031.bw
2016-12-20 15:46:37 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732032.bw
2016-12-20 15:46:38 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732033.bw
2016-12-20 15:46:39 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732034.bw
2016-12-20 15:46:40 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732035.bw
2016-12-20 15:46:40 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732036.bw
2016-12-20 15:46:41 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732037.bw
2016-12-20 15:46:42 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732038.bw
2016-12-20 15:46:42 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732039.bw
2016-12-20 15:46:43 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732040.bw
2016-12-20 15:46:43 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732041.bw
2016-12-20 15:46:44 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732042.bw
2016-12-20 15:46:44 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732043.bw
2016-12-20 15:46:45 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732044.bw
2016-12-20 15:46:46 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732045.bw
2016-12-20 15:46:46 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732046.bw
2016-12-20 15:46:47 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732047.bw
2016-12-20 15:46:47 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732048.bw
2016-12-20 15:46:49 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732049.bw
2016-12-20 15:46:50 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732050.bw
2016-12-20 15:46:50 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732051.bw
2016-12-20 15:46:51 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732052.bw
2016-12-20 15:46:51 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732053.bw
2016-12-20 15:46:52 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732054.bw
2016-12-20 15:46:52 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732055.bw
2016-12-20 15:46:53 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732056.bw
2016-12-20 15:46:54 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732057.bw
2016-12-20 15:46:55 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732058.bw
2016-12-20 15:46:56 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732059.bw
2016-12-20 15:46:57 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732060.bw
2016-12-20 15:46:58 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732061.bw
2016-12-20 15:46:59 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732062.bw
2016-12-20 15:47:00 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732063.bw
2016-12-20 15:47:00 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732064.bw
2016-12-20 15:47:01 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732065.bw
2016-12-20 15:47:02 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732066.bw
2016-12-20 15:47:03 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732067.bw
2016-12-20 15:47:04 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732068.bw
2016-12-20 15:47:05 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732069.bw
2016-12-20 15:47:05 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732070.bw
2016-12-20 15:47:06 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732071.bw
2016-12-20 15:47:07 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732072.bw
2016-12-20 15:47:08 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732073.bw
2016-12-20 15:47:09 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732074.bw
2016-12-20 15:47:10 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732075.bw
2016-12-20 15:47:11 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732076.bw
2016-12-20 15:47:12 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732077.bw
2016-12-20 15:47:12 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732078.bw
2016-12-20 15:47:13 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732079.bw
2016-12-20 15:47:14 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732080.bw
2016-12-20 15:47:15 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732082.bw
2016-12-20 15:47:16 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732081.bw
2016-12-20 15:47:17 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732083.bw
2016-12-20 15:47:18 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732084.bw
2016-12-20 15:47:19 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732085.bw
2016-12-20 15:47:19 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732086.bw
2016-12-20 15:47:20 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732087.bw
2016-12-20 15:47:21 railMatrix: processing file http://duffel.rail.bio/recount/SRP051472/bw/SRR1732088.bw
There were 50 or more warnings (use warnings() to see the first 50)
> dim(exonMatrix)
[1] 5 71
>
> ## Reproducibility info
> options(width = 120)
> session_info()
Session info -----------------------------------------------------------------------------------------------------------
setting value
version R Under development (unstable) (2016-10-26 r71594)
system x86_64, darwin13.4.0
ui AQUA
language (EN)
collate en_US.UTF-8
tz America/New_York
date 2016-12-20
Packages ---------------------------------------------------------------------------------------------------------------
package * version date source
acepack 1.4.1 2016-10-29 CRAN (R 3.4.0)
AnnotationDbi * 1.37.0 2016-10-26 Bioconductor
AnnotationHub 2.7.6 2016-11-19 Bioconductor
assertthat 0.1 2013-12-06 CRAN (R 3.4.0)
Biobase * 2.35.0 2016-10-23 Bioconductor
BiocGenerics * 0.21.1 2016-12-01 Bioconductor
BiocInstaller 1.25.2 2016-10-25 Bioconductor
BiocParallel 1.9.2 2016-11-18 Bioconductor
biomaRt 2.31.3 2016-12-01 Bioconductor
Biostrings 2.43.1 2016-11-17 Bioconductor
bitops 1.0-6 2013-08-17 CRAN (R 3.4.0)
BSgenome 1.43.1 2016-11-11 Bioconductor
bumphunter 1.15.0 2016-10-23 Bioconductor
cluster 2.0.5 2016-10-08 CRAN (R 3.4.0)
codetools 0.2-15 2016-10-05 CRAN (R 3.4.0)
colorspace 1.3-1 2016-11-18 CRAN (R 3.4.0)
data.table 1.10.0 2016-12-03 CRAN (R 3.4.0)
DBI 0.5-1 2016-09-10 CRAN (R 3.4.0)
derfinder 1.9.5 2016-11-30 Bioconductor
derfinderHelper 1.9.3 2016-11-29 Bioconductor
devtools * 1.12.0 2016-06-24 CRAN (R 3.4.0)
digest 0.6.10 2016-08-02 CRAN (R 3.4.0)
doRNG 1.6 2014-03-07 CRAN (R 3.4.0)
downloader 0.4 2015-07-09 CRAN (R 3.4.0)
EnsDb.Hsapiens.v79 * 2.1.0 2016-11-23 Bioconductor
ensembldb * 1.99.7 2016-12-02 Bioconductor
foreach 1.4.3 2015-10-13 CRAN (R 3.4.0)
foreign 0.8-67 2016-09-13 CRAN (R 3.4.0)
Formula 1.2-1 2015-04-07 CRAN (R 3.4.0)
GenomeInfoDb * 1.11.6 2016-11-17 Bioconductor
GenomicAlignments 1.11.4 2016-12-01 Bioconductor
GenomicFeatures * 1.27.4 2016-12-01 Bioconductor
GenomicFiles 1.11.3 2016-11-29 Bioconductor
GenomicRanges * 1.27.15 2016-12-04 Bioconductor
GEOquery 2.41.0 2016-10-25 Bioconductor
ggplot2 2.2.0 2016-11-11 CRAN (R 3.4.0)
gridExtra 2.2.1 2016-02-29 CRAN (R 3.4.0)
gtable 0.2.0 2016-02-26 CRAN (R 3.4.0)
Hmisc 4.0-0 2016-11-01 CRAN (R 3.4.0)
htmlTable 1.7 2016-10-19 CRAN (R 3.4.0)
htmltools 0.3.5 2016-03-21 CRAN (R 3.4.0)
httpuv 1.3.3 2015-08-04 CRAN (R 3.4.0)
httr 1.2.1 2016-07-03 CRAN (R 3.4.0)
interactiveDisplayBase 1.13.0 2016-10-23 Bioconductor
IRanges * 2.9.13 2016-12-01 Bioconductor
iterators 1.0.8 2015-10-13 CRAN (R 3.4.0)
jsonlite 1.1 2016-09-14 CRAN (R 3.4.0)
knitr 1.15.1 2016-11-22 CRAN (R 3.4.0)
lattice 0.20-34 2016-09-06 CRAN (R 3.4.0)
latticeExtra 0.6-28 2016-02-09 CRAN (R 3.4.0)
lazyeval 0.2.0 2016-06-12 CRAN (R 3.4.0)
locfit 1.5-9.1 2013-04-20 CRAN (R 3.4.0)
magrittr 1.5 2014-11-22 CRAN (R 3.4.0)
Matrix 1.2-7.1 2016-09-01 CRAN (R 3.4.0)
matrixStats 0.51.0 2016-10-09 CRAN (R 3.4.0)
memoise 1.0.0 2016-01-29 CRAN (R 3.4.0)
mime 0.5 2016-07-07 CRAN (R 3.4.0)
munsell 0.4.3 2016-02-13 CRAN (R 3.4.0)
nnet 7.3-12 2016-02-02 CRAN (R 3.4.0)
org.Hs.eg.db * 3.4.0 2016-11-15 Bioconductor
pkgmaker 0.22 2014-05-14 CRAN (R 3.4.0)
plyr 1.8.4 2016-06-08 CRAN (R 3.4.0)
ProtGenerics 1.7.0 2016-10-23 Bioconductor
qvalue 2.7.0 2016-10-23 Bioconductor
R6 2.2.0 2016-10-05 CRAN (R 3.4.0)
RColorBrewer 1.1-2 2014-12-07 CRAN (R 3.4.0)
Rcpp 0.12.8 2016-11-17 CRAN (R 3.4.0)
RCurl 1.95-4.8 2016-03-01 CRAN (R 3.4.0)
recount * 1.1.7 2016-11-29 Bioconductor
registry 0.3 2015-07-08 CRAN (R 3.4.0)
rentrez 1.0.4 2016-10-26 CRAN (R 3.4.0)
reshape2 1.4.2 2016-10-22 CRAN (R 3.4.0)
rngtools 1.2.4 2014-03-06 CRAN (R 3.4.0)
rpart 4.1-10 2015-06-29 CRAN (R 3.4.0)
Rsamtools 1.27.5 2016-12-01 Bioconductor
RSQLite 1.1 2016-11-27 CRAN (R 3.4.0)
rstudioapi 0.6 2016-06-27 CRAN (R 3.4.0)
rtracklayer 1.35.1 2016-10-29 Bioconductor
S4Vectors * 0.13.5 2016-12-01 Bioconductor
scales 0.4.1 2016-11-09 CRAN (R 3.4.0)
shiny 0.14.2 2016-11-01 CRAN (R 3.4.0)
stringi 1.1.2 2016-10-01 CRAN (R 3.4.0)
stringr 1.1.0 2016-08-19 CRAN (R 3.4.0)
SummarizedExperiment * 1.5.3 2016-11-11 Bioconductor
survival 2.40-1 2016-10-30 CRAN (R 3.4.0)
tibble 1.2 2016-08-26 CRAN (R 3.4.0)
TxDb.Hsapiens.UCSC.hg38.knownGene * 3.4.0 2016-11-15 Bioconductor
VariantAnnotation 1.21.10 2016-12-01 Bioconductor
withr 1.0.2 2016-06-20 CRAN (R 3.4.0)
XML 3.98-1.5 2016-11-10 CRAN (R 3.4.0)
xtable 1.8-2 2016-02-05 CRAN (R 3.4.0)
XVector 0.15.0 2016-10-23 Bioconductor
yaml 2.1.14 2016-11-12 CRAN (R 3.4.0)
zlibbioc 1.21.0 2016-10-23 Bioconductor
>
@ronstewart
Copy link

Thanks. I know this isn't your work, but I think there is something wrong with the TxDb.Hsapiens.UCSC.hg38.knownGene package. If you go to the UCSC genome browser and choose HG38, POU5F1 is there as a known gene. POU5F1 is an extremely important gene. Anyone interested in embryonic stem cells or even any cell derived from an embryonic stem cell will need to know about this gene. It makes me a bit concerned that other important genes might be missing. As I am just learning R it is rather daunting to think about performing the exon summary described above on all the experiments I want to look at (which is dynamic, so any experiment within recount). Plus I'm concerned that other genes are missing, so I don't really want to do it as a one-off. It might be worth your time to do something like get a download of hg38 directly form UCSC and compare this to the gene list that the TxDb.Hsapiens.UCSC.hg38.knownGene package produces. I have been looking around for ways to access public data programmatically, and recount seems like the best option, but if the gene set is incomplete, it is really is problematic.

Like I said, recount seems like the best option. It would be great if the gene set could be in line with what UCSC HG38 actually has.

I don't mean to make extra work for you, but I'm concerned that I don't really have the skills to do what probably needs to be done.

Do you have plans for a recount3?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment