Skip to content

Instantly share code, notes, and snippets.

@JarrydWannenburg
Created September 3, 2022 15:59
Show Gist options
  • Save JarrydWannenburg/2ff24b56eb552d45a675802f7a01a1e7 to your computer and use it in GitHub Desktop.
Save JarrydWannenburg/2ff24b56eb552d45a675802f7a01a1e7 to your computer and use it in GitHub Desktop.
Google_News_Extraction_Article
# Create a function to get a count of the top n people mentioned in the article with counts
def get_person_counts(text):
# Remove linebreaks from the text
text = text.replace("\n"," paragraph break ")
doc = nlp(text)
# Loop through the doc object and extract PERSON (people) entities
res = []
for ent in doc.ents:
if ent.label_ == 'PERSON':
res.append(ent.text.replace("'s", ""))
# Create a dictionary that counts the number of times a word is mentioned
# https://stackoverflow.com/questions/61712565/count-words-in-a-list-and-add-them-to-a-dictionary-along-with-number-of-occurre
word_count = {}
for item in res:
if item in word_count:
word_count[item] += 1
else:
word_count[item] = 1
# Return a sorted dictionary of the org counts
n=3
top_n_person = dict(sorted(word_count.items(), key= lambda x: x[1], reverse=True)[:n])
return(top_n_person)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment