Skip to content

Instantly share code, notes, and snippets.

@vgoklani
Created October 27, 2011 14:52
Show Gist options
  • Save vgoklani/1319760 to your computer and use it in GitHub Desktop.
Save vgoklani/1319760 to your computer and use it in GitHub Desktop.
use BeautifulSoup to extract URLs from an HTML file
f = open('filename.txt', 'r')
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(f)
for tag in soup.findAll('a', href=True):
print tag['href']
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment