Skip to content

Instantly share code, notes, and snippets.

@nicoledan
Created March 8, 2017 20:26
Show Gist options
  • Save nicoledan/c7f5d386d51142d55577714c6975b7dd to your computer and use it in GitHub Desktop.
Save nicoledan/c7f5d386d51142d55577714c6975b7dd to your computer and use it in GitHub Desktop.
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
html = urlopen("http://www.imdb.com/chart/top")
bsObj = BeautifulSoup(html, "html.parser")
t = open("imdb250.txt", 'w')
for link in bsObj.find("div", {"id":"pagecontent"}).findAll( "a", href=re.compile("^(/title/)(.)*$") ):
if 'href' in link.attrs:
t.write(str(link.attrs['href']) + "\n")
t.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment