Skip to content

Instantly share code, notes, and snippets.

@infotroph
Last active September 1, 2016 19:09
Show Gist options
  • Save infotroph/73a917731b71104feb3381f0afe66a61 to your computer and use it in GitHub Desktop.
Save infotroph/73a917731b71104feb3381f0afe66a61 to your computer and use it in GitHub Desktop.
I have a set of taxonomies and I want to return a consensus taxonomy masking the levels that aren't shared by all members, e.g.
["Liliopsida;Poales;Poaceae;Elymus;Elymus repens", "Liliopsida;Poales;Poaceae;Elymus;Elymus nutans"]
should produce "Liliopsida;Poales;Poaceae;Elymus;",
but those plus "Liliopsida;Poales;Cyperaceae;Carex;Carex ovalis" should equal "Liliopsida;Poales;;;"
My first stab:
def mask_to_agreement(taxstrings):
taxlists = [t.split(';') for t in taxstrings]
taxout = [''] * len(taxlists[0]) # assumes all lists are same length!
for i in range(0, len(taxlists[0])):
taxon = taxlists[0][i]
if [t[i] for t in taxlists].count(taxon) == len(taxlists):
taxout[i] = taxon
else:
break
return ';'.join(taxout)
This appears to works, but seems awfully baroque -- is there a way to do this as a common-substring operation instead?
@infotroph
Copy link
Author

infotroph commented Sep 1, 2016

def mask_to_agreement2(taxstrings):
    taxtuples = zip(*[t.split(';') for t in taxstrings])
    taxout = []
    for t in taxtuples:
        if t.count(t[0]) == len(t):
            taxout.append(t[0])
        else:
            taxout.append('')
    return ';'.join(taxout)

>>> a = "A;B;C;D;E"
>>> b = "A;B;C;D;F"
>>> c = "A;B;G;H;I"
>>> mask_to_agreement([a,b])
'A;B;C;D;'
>>> mask_to_agreement([a,b,c])
'A;B;;;'
>>> mask_to_agreement2([a,b])
'A;B;C;D;'
>>> mask_to_agreement2([a,b,c])
'A;B;;;'

@celoyd
Copy link

celoyd commented Sep 1, 2016

a = "Liliopsida;Poales;Poaceae;Elymus;Elymus repens"
b = "Liliopsida;Poales;Poaceae;Elymus;Elymus nutans"
c = "Liliopsida;Poales;Cyperaceae;Carex;Carex ovalis"
d = "WRONG_Liliopsida;Poales;Cyperaceae;INVALID_Carex;Carex ovalis"

def common_head(hierarchies):
  hierarchies = [h.split(';') for h in hierarchies]
  common = hierarchies[0]
  for other in hierarchies[1:]:
    for level in range(len(common)):
      if other[level] != common[level]:
        common = common[:level]
        break
  return common

print(common_head([a, b]))
print(common_head([a, b, c]))
print(common_head([a, b, c, d]))
print(common_head([a, d]))
print(common_head([a, c]))

def mask_to_agreement(taxstrings):
    taxlists = [t.split(';') for t in taxstrings]
    taxout = [''] * len(taxlists[0]) # assumes all lists are same length!
    for i in range(0, len(taxlists[0])):
        taxon = taxlists[0][i]
        if [t[i] for t in taxlists].count(taxon) == len(taxlists):
            taxout[i] = taxon
        else:
            break
    return ';'.join(taxout)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment