Skip to content

Instantly share code, notes, and snippets.

@nirmalyaghosh
Created August 7, 2016 13:20
Show Gist options
  • Save nirmalyaghosh/e31dfcdf36ab6020c35db81af21582d6 to your computer and use it in GitHub Desktop.
Save nirmalyaghosh/e31dfcdf36ab6020c35db81af21582d6 to your computer and use it in GitHub Desktop.
Scrapes Google for gathering news summaries. Written in response to http://stackoverflow.com/q/38769951. The code can obviously be improved. It was quickly written to give an idea.
from bs4 import BeautifulSoup
import requests
import time
from random import randint
def scrape_news_summaries(s):
# It is based on a notebook posted on Kaggle, http://bit.ly/1VJ8pF9
time.sleep( randint(0,2) ) #relax and don't let google be angry
r = requests.get("http://www.google.co.uk/search?q="+s+"&tbm=nws")
content = r.text
news_summaries = []
soup = BeautifulSoup(content)
st_divs = soup.findAll("div", {"class": "st"})
for st_div in st_divs:
news_summaries.append(st_div.text)
return news_summaries
# l = scrape_news_summaries("T-Notes")
l = scrape_news_summaries(""""Sovereign-Debt"+Government-Bonds""")
for n in l:
print n, "\n"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment