Skip to content

Instantly share code, notes, and snippets.

@Sirsirious
Last active February 5, 2020 17:23
Show Gist options
  • Save Sirsirious/c8ee68b7daef8b93eba76ee64a176be9 to your computer and use it in GitHub Desktop.
Save Sirsirious/c8ee68b7daef8b93eba76ee64a176be9 to your computer and use it in GitHub Desktop.
Making our DummyTokenizer.
import string
class DummyTokenizer:
def __init__(self, sentence, token_boundaries=[' ', '-'],
punctuations=string.punctuation, delimiter_token='<SPLIT>'):
self.tokens = []
self.raw = str(sentence)
self._token_boundaries = token_boundaries
self._delimiter_token = delimiter_token
self._punctuations = punctuations
self._index = 0
self._tokenize()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment