Skip to content

Instantly share code, notes, and snippets.

@widiger-anna
Last active May 4, 2018 18:34
Show Gist options
  • Save widiger-anna/acce84a8035cc3fb6930989714c73bd8 to your computer and use it in GitHub Desktop.
Save widiger-anna/acce84a8035cc3fb6930989714c73bd8 to your computer and use it in GitHub Desktop.
Testing spaCy tokenizer
from __future__ import unicode_literals, print_function
# Install spaCy: pip install spacy && python -m spacy download en
import spacy
from spacymoji import Emoji
nlp = spacy.load('en_core_web_sm')
emoji = Emoji(nlp)
nlp.add_pipe(emoji, first=True)
doc = nlp(u"This is a test 😻 👍🏿")
assert doc._.has_emoji == True
assert doc[2:5]._.has_emoji == True
assert doc[0]._.is_emoji == False
assert doc[4]._.is_emoji == True
assert doc[5]._.emoji_desc == u'thumbs up dark skin tone'
assert len(doc._.emoji) == 2
assert doc._.emoji[1] == (u'👍🏿', 5, u'thumbs up dark skin tone')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment