Skip to content

Instantly share code, notes, and snippets.

@allanj
Last active March 29, 2021 14:37
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save allanj/b9bd448dc9b70d71eb7c2b6dd33fe4ef to your computer and use it in GitHub Desktop.
Save allanj/b9bd448dc9b70d71eb7c2b6dd33fe4ef to your computer and use it in GitHub Desktop.
Convert the tags from IOB1 to IOB2 tagging scheme
"""
IOB1: O I I B I
IOB2: O B I B I
"""
from typing import List
def iob2(tags: List[str]):
"""
Check that tags have a valid IOB format.
Tags in IOB1 format are converted to IOB2.
"""
for i, tag in enumerate(tags):
if tag == 'O':
continue
split = tag.split('-')
if len(split) != 2 or split[0] not in ['I', 'B']:
return False
if split[0] == 'B':
continue
elif i == 0 or tags[i - 1] == 'O': # conversion IOB1 to IOB2
tags[i] = 'B' + tag[1:]
elif tags[i - 1][1:] == tag[1:]:
continue
else: # conversion IOB1 to IOB2
tags[i] = 'B' + tag[1:]
return True
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment