Skip to content

Instantly share code, notes, and snippets.

@jrslagle
Created January 1, 2016 10:57
Show Gist options
  • Save jrslagle/b7c14a298208e530ae77 to your computer and use it in GitHub Desktop.
Save jrslagle/b7c14a298208e530ae77 to your computer and use it in GitHub Desktop.
a script to parse bacterial phylogeny from http://www.bacterio.net/-classifphyla.html and print it into a table format.
# original text is from http://www.bacterio.net/-classifphyla.html
# load hard-coded filename LPSN.txt into a string
# create a File object from LPSN.txt
file = open("LPSN.txt") or die "LPSN.txt not found."
lineArray = Array.new
file.each_line {|line|
lineArray.push line
}
file.close()
# todo: parse lines your own way if it helps separate genuses.
# todo: grab this data directly from the website
# use the included date to report how old the current genus table is
# phylum = "no phylum"
# clas = "no class"
# order = "no order"
# family = "no family"
# genus = "no genus"
genusNext = false
unclassedNext = false
outFile = open("LPSN-out.txt",'w')
anomoliesFile = open("LPSN-anomolies.txt","w")
lineArray.each do |line|
# make an exhaustive if/else chain to categorize each line
# some update the phylogenic labels
# some text lines are ignored
# only the genus line adds 1 or more lines to the genus table
# find all lines containing "Phylum"
if (/^Phylum "?(?<phylum>\w+)/ =~ line) != nil
# outFile.write("#{phylum}\n") # write the capture
elsif (/^Class "?(?<clas>\w+)/ =~ line) != nil
# outFile.write("#{clas}\n") # write the capture
elsif (/^Order "?(?<order>\w+)/ =~ line) != nil
# outFile.write("#{order}\n") # write the capture
elsif (/^Family "?(?<family>\w+)/ =~ line) != nil
genusNext = true
# outFile.write("#{family}\n") # write the capture
elsif (/^Unclassified "?(?<unclassed>\w+)/ =~ line) != nil
genusNext = true
unclassedNext = true
elsif genusNext
if (/\w+/ =~ line) != nil
genusNext = false
# line is list of genuses
outFile.write(line) # write the capture
end
else
# write anomaly lines to their own file.
anomoliesFile.write(line)
end
end
outFile.close()
anomoliesFile.close()
# save the table into a file
# include the date.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment