Skip to content

Instantly share code, notes, and snippets.

@qlyoung
Last active August 29, 2015 14:02
Show Gist options
  • Save qlyoung/7f1152338e880ac81cbe to your computer and use it in GitHub Desktop.
Save qlyoung/7f1152338e880ac81cbe to your computer and use it in GitHub Desktop.
rough proof-of-concept for alt-text of https://xkcd.com/903/
#!/usr/bin/python
# dependencies: BeautifulSoup, requests
from BeautifulSoup import BeautifulSoup
import requests
# starting page
url = raw_input("Wikipedia URL: ")
pagename = ''
while not "Philosophy" in pagename:
page = requests.get(url).content
soup = BeautifulSoup(page)
# main content
mc = soup.findAll(attrs={'id':'mw-content-text'})[0]
# first top-level paragraph in main content
pg = mc.findChild(name='p', recursive=False)
# first top-level hyperlink in paragraph
hl = pg.findChild(name='a', recursive=False)
# href of hyperlink
uri = hl['href']
# compose URL
url = "http://en.wikipedia.org" + uri
print uri.split('/')[2] + " -> "
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment