Skip to content

Instantly share code, notes, and snippets.

@konverner
Created February 1, 2024 00:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save konverner/04ba9501319e8de4f9dfbd2cf6669ddc to your computer and use it in GitHub Desktop.
Save konverner/04ba9501319e8de4f9dfbd2cf6669ddc to your computer and use it in GitHub Desktop.
fix corrupted spans in NER annotation
def fix_span(text: str, span: dict):
# let us check that spans are correctly extracted
fixed_span = span.copy()
# span starts with a space or a punctuation
while text[fixed_span["start"]] in [" ", ".", ",", ";", ":", "!", "?"]:
fixed_span["start"] += 1
# span is cut in the begging: e.g. "ashington DC"
if fixed_span["start"] > 0:
while text[fixed_span["start"] - 1] != " ":
fixed_span["start"] -= 1
if fixed_span["start"] == 0:
break
# span ends with a space or a punctuation: e.g. "Washington DC? "
while text[fixed_span["end"] - 1] in [" ", ".", ",", ";", ":", "!", "?"]:
fixed_span["end"] -= 1
if fixed_span["end"] == 0:
break
# span is cut in the end: e.g. "Washington D"
if fixed_span["end"] < len(text) - 1:
while text[fixed_span["end"]] not in [" ", ".", ",", ";", ":", "!", "?"]:
fixed_span["end"] += 1
if fixed_span["end"] == len(text) - 1:
break
fixed_span["text"] = text[fixed_span["start"]:fixed_span["end"]]
return fixed_span
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment