Last active
September 1, 2016 19:09
-
-
Save infotroph/73a917731b71104feb3381f0afe66a61 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I have a set of taxonomies and I want to return a consensus taxonomy masking the levels that aren't shared by all members, e.g. | |
["Liliopsida;Poales;Poaceae;Elymus;Elymus repens", "Liliopsida;Poales;Poaceae;Elymus;Elymus nutans"] | |
should produce "Liliopsida;Poales;Poaceae;Elymus;", | |
but those plus "Liliopsida;Poales;Cyperaceae;Carex;Carex ovalis" should equal "Liliopsida;Poales;;;" | |
My first stab: | |
def mask_to_agreement(taxstrings): | |
taxlists = [t.split(';') for t in taxstrings] | |
taxout = [''] * len(taxlists[0]) # assumes all lists are same length! | |
for i in range(0, len(taxlists[0])): | |
taxon = taxlists[0][i] | |
if [t[i] for t in taxlists].count(taxon) == len(taxlists): | |
taxout[i] = taxon | |
else: | |
break | |
return ';'.join(taxout) | |
This appears to works, but seems awfully baroque -- is there a way to do this as a common-substring operation instead? |
Author
infotroph
commented
Sep 1, 2016
•
a = "Liliopsida;Poales;Poaceae;Elymus;Elymus repens"
b = "Liliopsida;Poales;Poaceae;Elymus;Elymus nutans"
c = "Liliopsida;Poales;Cyperaceae;Carex;Carex ovalis"
d = "WRONG_Liliopsida;Poales;Cyperaceae;INVALID_Carex;Carex ovalis"
def common_head(hierarchies):
hierarchies = [h.split(';') for h in hierarchies]
common = hierarchies[0]
for other in hierarchies[1:]:
for level in range(len(common)):
if other[level] != common[level]:
common = common[:level]
break
return common
print(common_head([a, b]))
print(common_head([a, b, c]))
print(common_head([a, b, c, d]))
print(common_head([a, d]))
print(common_head([a, c]))
def mask_to_agreement(taxstrings):
taxlists = [t.split(';') for t in taxstrings]
taxout = [''] * len(taxlists[0]) # assumes all lists are same length!
for i in range(0, len(taxlists[0])):
taxon = taxlists[0][i]
if [t[i] for t in taxlists].count(taxon) == len(taxlists):
taxout[i] = taxon
else:
break
return ';'.join(taxout)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment