Skip to content

Instantly share code, notes, and snippets.

@kcranston
Last active August 19, 2016 18:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kcranston/1a5723bdcfed09cafee8650b4d482803 to your computer and use it in GitHub Desktop.
Save kcranston/1a5723bdcfed09cafee8650b4d482803 to your computer and use it in GitHub Desktop.
Diagnosting encoding issues with OTT
import codecs
import csv
import peyotl.ott as ott
ott_loc = "path-to-ott"
taxonomy = ott.OTT(ott_loc)
ott_names = taxonomy.ott_id_to_names
ott_parents = taxonomy.ott_id2par_ott_id
ott_filename = "ott.csv"
synonym_filename = "synonyms.csv"
with codecs.open(ott_filename,'w','utf-8') as of, codecs.open(synonym_filename,'w','utf-8') as sf:
ofwriter = csv.writer(of)
ofwriter.writerow(('id','name','parent_id'))
sfwriter = csv.writer(sf)
sfwriter.writerow(('id','name'))
for ott_id in ott_names:
name = ott_names[ott_id]
synonyms=[]
if (isinstance(name,tuple)):
name = name[0]
synonyms = name[1:]
parent_id = ott_parents[ott_id]
ofwriter.writerow((ott_id,name,parent_id))
for s in synonyms:
sfwriter.writerow((ott_id,s))
ofwriter.close()
sfwriter.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment