Skip to content

Instantly share code, notes, and snippets.

@walterst
Created April 30, 2015 13:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save walterst/71b587c1eb7a90a69297 to your computer and use it in GitHub Desktop.
Save walterst/71b587c1eb7a90a69297 to your computer and use it in GitHub Desktop.
Use this script to add taxonomy strings to feature importance scores from OTU level data (see https://gist.github.com/walterst/2222618976a66b3fc8dd)
#!/usr/bin/env python
# USAGE: python get_matching_taxa.py tab_sep_OTU_table feature_importance_file > feature_imp_with_taxa.txt
from sys import argv
otu_table = open(argv[1], "U")
otu_taxa = {}
for line in otu_table:
if line.startswith("#"):
continue
if len(line.strip()) == 0:
continue
curr_line = line.split("\t")
otu = curr_line[0]
taxa = curr_line[-1].strip()
otu_taxa[otu] = taxa
importance_table = open(argv[2], "U")
for line in importance_table:
curr_line = line.strip().replace('"', '').split('\t')
if line.startswith('"CCD"'):
print '%s\tTaxonomy' % line.strip()
continue
try:
print "%s\t%s" % ("\t".join(curr_line), otu_taxa[curr_line[0]].replace(" ", ""))
except KeyError:
print "%s\tNA" % "\t".join(curr_line)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment