Skip to content

Instantly share code, notes, and snippets.

@mariomartinezsz
Created January 5, 2013 20:20
Show Gist options
  • Save mariomartinezsz/4463432 to your computer and use it in GitHub Desktop.
Save mariomartinezsz/4463432 to your computer and use it in GitHub Desktop.
Find links to pdf files in HTML with BeautifulSoup (Just one level)
import urllib2
from bs4 import BeautifulSoup
my_url = 'http://slav0nic.org.ua/static/books/python/'
html=urllib2.urlopen(my_url).read()
sopa = BeautifulSoup(html)
current_link = ''
for link in sopa.find_all('a'):
current_link = link.get('href')
if current_link.endswith('pdf'):
print('Tengo un pdf: ' + current_link)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment