Skip to content

Instantly share code, notes, and snippets.

@MemphisMeng
Created August 12, 2020 21:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save MemphisMeng/6d8ab823fd4b3a64f01d8faf19473bb5 to your computer and use it in GitHub Desktop.
Save MemphisMeng/6d8ab823fd4b3a64f01d8faf19473bb5 to your computer and use it in GitHub Desktop.
from collections import Counter
import en_core_web_sm
# preprocessing
nlp = en_core_web_sm.load()
tweet_article = nlp('|'.join(tweets.tweets))
# make sure the entities we need are persons
items = [x.text for x in tweet_article.ents if x.label_ == 'PERSON']
# exclude the obvious misclassified entities
items = [celebrity[0] for celebrity in Counter(items).most_common(20) if
'http' not in celebrity[0] and '@' not in celebrity[0]
and '#' not in celebrity[0]]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment