Skip to content

Instantly share code, notes, and snippets.

@danielecook
Last active August 29, 2015 13:58
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save danielecook/9947187 to your computer and use it in GitHub Desktop.
Save danielecook/9947187 to your computer and use it in GitHub Desktop.
Generates the pairwise mapping between human <==> c. elegans genes #bash
wget 'ftp://ftp.ncbi.nih.gov/pub/HomoloGene/current/homologene.data'
egrep "\t9606\t" homologene.data | sort | cut -f 1,3,4 > human.txt
egrep "\t6239\t" homologene.data | sort | cut -f 1,3,4 > celegans.txt
join -1 1 -2 1 -t $'\t' human.txt celegans.txt | cut -f 2,3,4,5 | sort | echo -e "Human_Entrez\tHuman_Symbol\tElegans_Entrez\tElegans_Symbol\n$(cat -)" > orthologs.txt
rm human.txt celegans.txt homologene.data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment