Skip to content

Instantly share code, notes, and snippets.

@varundey
Created April 9, 2016 11:51
Show Gist options
  • Save varundey/5a1119b818e1aadf3f82fb0e8501579c to your computer and use it in GitHub Desktop.
Save varundey/5a1119b818e1aadf3f82fb0e8501579c to your computer and use it in GitHub Desktop.
RSS news feed crawler
url = "http://timesofindia.indiatimes.com/rss.cms"
from bs4 import BeautifulSoup as bs
import requests
soup = bs(requests.get(url).content, "lxml")
soup = soup.findAll("table",{"border":"0", "width":"740", "cellspacing":"0", "cellpadding":'0'})
print len(soup)
dic = {}
file = open("newsrss.txt","a")
for i in soup:
x = i.findAll("tr")
for j in x:
q = j.findAll("td")[0]
key = q.text
val = q.find("a").get("href")
dic[key]=val
file.write(str(dic))
file.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment