Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dimitryzub/c3e7a01e872947ddc416377717cc6f95 to your computer and use it in GitHub Desktop.
Save dimitryzub/c3e7a01e872947ddc416377717cc6f95 to your computer and use it in GitHub Desktop.
How to Web Scrape all Google News Articles with Python and SerpApi
# video tutorial: https://www.youtube.com/watch?v=fOs_eOsLP54
from serpapi import GoogleSearch
from urllib.parse import (parse_qsl, urlsplit)
params = {
"api_key": "...", # serpapi api key
"engine": "google", # search engine
"q": "minecraft", # search query
"gl": "us", # country of the search
"hl": "en", # language
"num": "100", # number of news per page
"tbm": "nws" # news results
}
search = GoogleSearch(params) # where data extraction happens
# to show page number
page_num = 0
# iterate over all pages
while True:
results = search.get_dict() # JSON -> Python dict
if "error" in results:
print(results["error"])
break
page_num += 1
print(f"Current page: {page_num}")
# iterate over organic results and extract the data
for result in results.get("news_results", []):
print(result.get("position"), result.get("title"), sep="\n")
if "next" in results.get("serpapi_pagination", {}):
search.params_dict.update(dict(parse_qsl(urlsplit(results.get("serpapi_pagination").get("next")).query)))
else:
break
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment