Skip to content

Instantly share code, notes, and snippets.

@JarrydWannenburg
Created September 3, 2022 15:15
Show Gist options
  • Save JarrydWannenburg/ff0779af81408b821ef04c4ebf8db823 to your computer and use it in GitHub Desktop.
Save JarrydWannenburg/ff0779af81408b821ef04c4ebf8db823 to your computer and use it in GitHub Desktop.
Google_News_Extraction_Article
# Assign just the information on the articles to our wells_fargo obj
wells_fargo = wells_fargo['articles'] # 100 is the max length of articles to return
# Extract the urls for each article returned by newsAPI
wells_fargo_urls = [i['url'] for i in wells_fargo]
# Using newspaper3k, create a function to return an article given its URL
# See https://newspaper.readthedocs.io/en/latest/user_guide/quickstart.html for more detail
def get_article(url):
article = Article(url, fetch_images=False, memoize_articles = False)
article.download()
article.parse()
return article
# For all urls returned by the keyword search, use newspaper3k to extract the article as an obj
wells_fargo_articles = [get_article(i) for i in wells_fargo_urls]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment