Skip to content

Instantly share code, notes, and snippets.

@relaiyavalli
Last active August 29, 2015 14:02
Show Gist options
  • Save relaiyavalli/867072d7207ba07bfbeb to your computer and use it in GitHub Desktop.
Save relaiyavalli/867072d7207ba07bfbeb to your computer and use it in GitHub Desktop.
Here is a simple Python code snippet to retrieve news titles from various news feeds. This example retrieves Technology news headlines from Reuters. All you need is urllib and cookielib installed. Please be careful only to scrap permissible sites and rss feeds.
import urllib2
import re
import cookielib
from cookielib import CookieJar
# If the web site expects cookies
cookie = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie))
# Get Scrapper pose as Browser
opener.addHeaders = [('User-agent', 'Mozilla/5.0')]
page = 'http://feeds.reuters.com/reuters/technologyNews'
def main():
try:
# Open the page and retrieve contents
pageData = opener.open(page).read()
#Filter for news headlines
titles = re.findall(r'<title>(.*?)</title>', pageData)
for title in titles:
print title
except Exception, e:
print str(e)
main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment