Created
September 3, 2022 15:15
-
-
Save JarrydWannenburg/ff0779af81408b821ef04c4ebf8db823 to your computer and use it in GitHub Desktop.
Google_News_Extraction_Article
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Assign just the information on the articles to our wells_fargo obj | |
wells_fargo = wells_fargo['articles'] # 100 is the max length of articles to return | |
# Extract the urls for each article returned by newsAPI | |
wells_fargo_urls = [i['url'] for i in wells_fargo] | |
# Using newspaper3k, create a function to return an article given its URL | |
# See https://newspaper.readthedocs.io/en/latest/user_guide/quickstart.html for more detail | |
def get_article(url): | |
article = Article(url, fetch_images=False, memoize_articles = False) | |
article.download() | |
article.parse() | |
return article | |
# For all urls returned by the keyword search, use newspaper3k to extract the article as an obj | |
wells_fargo_articles = [get_article(i) for i in wells_fargo_urls] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment