Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save nirmalyaghosh/496827a86b8c4a41a0a83803da2ebb3f to your computer and use it in GitHub Desktop.
Save nirmalyaghosh/496827a86b8c4a41a0a83803da2ebb3f to your computer and use it in GitHub Desktop.
Extract positions of indicated spans from indicated text. Used as a precursor to the step to convert named entities identified by alternative processes into a spaCy NER format
from typing import List
def extract_span_start_end_positions(text: str, spans: List[str]):
"""
Extract positions of indicated spans from indicated text.
Adapted from : https://www.programcreek.com/python/?CodeExample=convert+to+spans
Args:
text: The string to be searched
spans: The spans of interest within the string. Can be single or
multiple contiguous words.
Returns:
[list of (span, start, end) tuples] mapping each token to corresponding indices
in the text.
"""
cur_idx = 0
spans_w_positions = []
for span in spans:
tmp = text.find(span, cur_idx)
l = len(span)
cur_idx = tmp
spans_w_positions.append((span, cur_idx, cur_idx + l))
cur_idx += l
return spans_w_positions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment