Skip to content

Instantly share code, notes, and snippets.

@cthoyt
Last active April 17, 2020 13:23
Show Gist options
  • Save cthoyt/b9224a0e409927e772d10059fd4db47c to your computer and use it in GitHub Desktop.
Save cthoyt/b9224a0e409927e772d10059fd4db47c to your computer and use it in GitHub Desktop.
Remapping organism names in BioGRID identifiers dump

BioGRID Identifiers Problem, Solved

The file I included here has a set of organisms in the BioGRID identifiers download (version 3.5.184), latest as of time of writing on 2020-04-17) whose ORGANISM_OFFICIAL_NAME is not correct. I went and mapped these all with a mixture of synonym search on NCBITaxon and manual intervention. Each has the taxonomy identifier, so it can be used to get the most up-to-date information.

I would highly suggest including a taxonomy ID in this dump as well as the name, so it can be programatically mapped for anyone trying to integrate this data with other sources

taxonomy_remapping = { # correct name in comment after
"Canis familiaris": "9615", # Canis lupus familiaris
"Human Herpesvirus 1": "10298", # Human alphaherpesvirus 1
"Human Herpesvirus 3": "10335", # Human alphaherpesvirus 3
"Murid Herpesvirus 1": "10366", # Murid betaherpesvirus 1
"Human Herpesvirus 4": "10376", # Human gammaherpesvirus 4
"Hepatitus C Virus": "11103", # Hepacivirus C
"Human Immunodeficiency Virus 1": "11676", # Human immunodeficiency virus 1
"Human Immunodeficiency Virus 2": "11709", # Human immunodeficiency virus 2
"Human Herpesvirus 2": "10310", # Human alphaherpesvirus 2
"Human Herpesvirus 5": "10359", # Human betaherpesvirus 5
"Human Herpesvirus 6A": "32603", # Human betaherpesvirus 6A
"Human Herpesvirus 6B": "32604", # Human betaherpesvirus 6B
"Human Herpesvirus 7": "10372", # Human betaherpesvirus 7
"Human Herpesvirus 8": "37296", # Human gammaherpesvirus 8
"Emericella nidulans": "162425", # Aspergillus nidulans
"Bassica campestris": "145471", # Brassica rapa subsp. oleifera (was a typo)
"Tarsius syrichta": "1868482", # Carlito syrichta
"Felis Catus": "9685", # Felis catus
"Vaccinia Virus": "10245", # Vaccinia virus
"Simian Virus 40": "1891767", # Macaca mulatta polyomavirus 1
"Simian Immunodeficiency Virus": "11723", # Simian immunodeficiency virus
"Tobacco Mosaic Virus": "12242", # Tobacco mosaic virus
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment