Instantly share code, notes, and snippets.

@funktor /bioe_encoding.py Secret
Last active Oct 14, 2018

Embed
What would you like to do?
def get_sequence_labels(sentences, phrases):
labels = [['O' for word in sent] for sent in sentences]
for idx in range(len(sentences)):
sent, phrase = sentences[idx], phrases[idx]
for chunk in phrase:
n = len(chunk)
for start in range(len(sent)-n+1):
if sent[start:start+n] == chunk:
if n == 1:
labels[idx][start] = 'B'
else:
labels[idx][start+1:start+n-1] = ['I']*(n-2)
labels[idx][start] = 'B'
labels[idx][start+n-1] = 'E'
return labels
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment