Skip to content

Instantly share code, notes, and snippets.

@yk-tanigawa
Last active July 30, 2021 17:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save yk-tanigawa/78e453c9ca7ae3731d593885011a2fa0 to your computer and use it in GitHub Desktop.
Save yk-tanigawa/78e453c9ca7ae3731d593885011a2fa0 to your computer and use it in GitHub Desktop.

Ensembl BioMart

  • Go to Ensembl portal and select BioMart
  • Select the following attributes:
    • Chromosome/scaffold name
    • Gene start (bp)
    • Gene end (bp)
    • Gene stable ID
    • Gene stable ID version
    • Gene name
    • Gene type
  • Export the results to a tsv file.
  • Sort them:
    • cat mart_export.txt | awk 'NR==1' > EnsemblBioMart.GRCh37p13.EnsemblGenes104.tsv
    • cat mart_export.txt | awk 'NR>1' | sort -k1,1V -k2,2n -k3,3n >> EnsemblBioMart.GRCh37p13.EnsemblGenes104.tsv
  • Edit the header line:
    • #Chromosome Gene_start Gene_end Gene_stable_ID Gene_stable_ID_version Gene_symbol Gene_type
  • Compress and index them:
    • bgzip -l9 EnsemblBioMart.GRCh37p13.EnsemblGenes104.tsv
    • tabix -s1 -b2 -e3 EnsemblBioMart.GRCh37p13.EnsemblGenes104.tsv.gz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment