Skip to content

Instantly share code, notes, and snippets.

@ducalpha
Created November 2, 2018 04:10
Show Gist options
  • Save ducalpha/910519224a349b7cbde3ac0bea5ef381 to your computer and use it in GitHub Desktop.
Save ducalpha/910519224a349b7cbde3ac0bea5ef381 to your computer and use it in GitHub Desktop.
Sentencize a policy
from typing import List
from unidecode import unidecode
import spacy
nlp = spacy.load('en')
nlp.remove_pipe('ner')
def sentencized(line: str) -> List[str]:
doc = nlp(line)
return [sent.text for sent in doc.sents]
with open('policy.txt') as infile:
with open('policy_sentencized.txt', 'w') as ofile:
for line in infile:
line = unidecode(line)
ofile.writelines('\n'.join(sentencized(line)))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment