Skip to content

Instantly share code, notes, and snippets.

@00krishna
Created February 15, 2014 23:21
Show Gist options
  • Save 00krishna/9026671 to your computer and use it in GitHub Desktop.
Save 00krishna/9026671 to your computer and use it in GitHub Desktop.
Get all links in a web page
import re, urllib
htmlSource = urllib.urlopen("http://sebsauvage.net/index.html").read(200000)
linksList = re.findall('<a href=(.*?)>.*?</a>',htmlSource)
for link in linksList:
print link
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment