Skip to content

Instantly share code, notes, and snippets.

@schwarzmx
Created December 5, 2013 05:17
Show Gist options
  • Save schwarzmx/7800508 to your computer and use it in GitHub Desktop.
Save schwarzmx/7800508 to your computer and use it in GitHub Desktop.
Simple script for retrieving html from some sites
import urllib
urls = ['http://worldnews.nbcnews.com/_news/2013/11/25/21611878-easing-of-iran-sanctions-will-do-little-to-lift-crippled-economy-experts-say?lite',
'http://nbcpolitics.nbcnews.com/_news/2013/11/25/21611792-obama-on-iran-deal-us-cannot-close-the-door-on-diplomacy?lite',
'http://nbcpolitics.nbcnews.com/_news/2013/11/25/21612201-obama-tells-heckler-no-executive-action-to-halt-deportations?lite',
'http://www.cbsnews.com/news/iranians-hope-nuclear-deal-will-boost-economy/',
'http://www.cbsnews.com/news/iranians-hope-nuclear-deal-will-boost-economy/',
'http://www.foxnews.com/science/2013/11/25/iran-uranium-temporarily-converted-experts-warn/',
'http://www.foxnews.com/politics/2013/11/25/senators-weigh-additional-sanctions-amid-iran-nuclear-deal/',
'http://www.foxnews.com/politics/2013/11/25/wife-imprisoned-pastor-says-family-devastated-after-iran-deal/']
# foxnews reads whatever comes from an Html header called 'User-Agent'
# the default one seems to be banned, so I override it with my own
class AppURLopener(urllib.FancyURLopener):
version = "Fernando's Fake Agent"
urllib._urlopener = AppURLopener()
for url in urls:
f = urllib.urlopen(url)
print "***********************************************************"
print "URL content:"
print f.read()
print "***********************************************************"
print "url read: " + url
raw_input("Press Enter to continue...")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment