Skip to content

Instantly share code, notes, and snippets.

@nakao
Last active December 19, 2015 14:09
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nakao/5966847 to your computer and use it in GitHub Desktop.
Save nakao/5966847 to your computer and use it in GitHub Desktop.
Fix invalid ChEMBL-RDF files. Put the script in the ChEMBL-RDF directory, then exec. You get fixed files as "*.ttl.new"
files = ["chembl_16_biocmpt.ttl", "chembl_16_target.ttl", "chembl_16_targetcmpt.ttl"]
def iotax(arg)
"<http://identifiers.org/taxonomy/#{arg}>"
end
def ncbitax(arg)
"<http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=#{arg}>"
end
files.each do |file|
fin = File.new(file, "r")
out = File.new(file + ".new", "w")
fin.each("\n") do |line|
if line =~ /iotax:(\d+)/
num = $1
line.sub!(/iotax:\d+/, iotax(num))
end
if line =~ /ncbitax:(\d+)/
num = $1
line.sub!(/ncbitax:\d+/, ncbitax(num))
end
out.print(line)
end
fin.close
out.close
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment