Skip to content

Instantly share code, notes, and snippets.

@jabirjamal
Created July 23, 2021 08:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jabirjamal/b12607ad906d90c3ff52bc65613e1e90 to your computer and use it in GitHub Desktop.
Save jabirjamal/b12607ad906d90c3ff52bc65613e1e90 to your computer and use it in GitHub Desktop.
custom tokenizer
doc = nlp("gimme that book")
print([w.text for w in doc])
from spacy.symbols import ORTH
special_case = [{ORTH: "gim"}, {ORTH: "me"}]
nlp.tokenizer.add_special_case("gimme", special_case)
print([w.text for w in doc])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment