Skip to content

Instantly share code, notes, and snippets.

@j1o1h1n
Last active October 21, 2021 22:22
Show Gist options
  • Save j1o1h1n/0d3130a166226d041fd5cacd4f62819f to your computer and use it in GitHub Desktop.
Save j1o1h1n/0d3130a166226d041fd5cacd4f62819f to your computer and use it in GitHub Desktop.
def sent_tokenize(text, tokenizer):
return tokenizer.tokenize(text)
def word_tokenize(text, tokenizer, preserve_line=False):
sentences = [text] if preserve_line else sent_tokenize(text, tokenizer)
return [
token for sent in sentences for token in _treebank_word_tokenizer.tokenize(sent)
]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment