Skip to content

Instantly share code, notes, and snippets.

@bkutlu
Last active August 2, 2018 18:23
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bkutlu/26f371ffbe44835abfe3e10c35c308ab to your computer and use it in GitHub Desktop.
Save bkutlu/26f371ffbe44835abfe3e10c35c308ab to your computer and use it in GitHub Desktop.
Prepare a table annotations for human genes using Bioconductor
# I am sure this can be done in a multiple of ways: but here's a nice example of use of Reduce for merging lists
# This particular use case if when you are trying to make a look up table for Human genes
# Check out Marc Carlson's Annotationdbi packages and tutorials for alternative ways (SQL) of retrieving the information
# Caution note:
# the unintended consequence of joining on the Entrez Gene ids is the Ensembl ids for alternative ids are lost and usually the
# alternative ensembl id is used
# load package from the BioConductor Project
library("org.Hs.eg.db")
library("tidyverse")
Reduce(function(...) merge(..., by='gene_id', all.x=TRUE),
list(toTable(org.Hs.egENSEMBL2EG),
toTable(org.Hs.egSYMBOL),
toTable(org.Hs.egGENENAME))) %>%
as.tibble()
# A tibble: 28,964 x 4
gene_id ensembl_id symbol gene_name
<chr> <chr> <chr> <chr>
1 1 ENSG00000121410 A1BG alpha-1-B glycoprotein
2 10 ENSG00000156006 NAT2 N-acetyltransferase 2
3 100 ENSG00000196839 ADA adenosine deaminase
4 1000 ENSG00000170558 CDH2 cadherin 2
5 10000 ENSG00000117020 AKT3 AKT serine/threonine kinase 3
6 10000 ENSG00000275199 AKT3 AKT serine/threonine kinase 3
7 100008586 ENSG00000236362 GAGE12F G antigen 12F
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment