Skip to content

Instantly share code, notes, and snippets.

@isaac-ped
Created July 24, 2018 18:30
Show Gist options
  • Save isaac-ped/bf7620fb01dee93d750135034e731a99 to your computer and use it in GitHub Desktop.
Save isaac-ped/bf7620fb01dee93d750135034e731a99 to your computer and use it in GitHub Desktop.
import urllib2
from BeautifulSoup import BeautifulSoup
import re
#specify the url you want to query
url = "https://en.wikipedia.org/wiki/List_of_Off_the_Air_episodes"
#Query the website and return the html to the variable 'page'
page = urllib2.urlopen(url)
soup = BeautifulSoup(page)
featured = soup.findAll('span', attrs={'id': re.compile("Featured_songs.*")})
songs = []
for feature in featured:
items = feature.parent.parent.findAll('li')
for item in items:
songs.append(item.text)
for song in songs:
print song
print(len(songs))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment