Skip to content

Instantly share code, notes, and snippets.

@suriyadeepan
Last active August 26, 2016 06:02
Show Gist options
  • Save suriyadeepan/fa31820275e02b0c5d1ba301bb484fdc to your computer and use it in GitHub Desktop.
Save suriyadeepan/fa31820275e02b0c5d1ba301bb484fdc to your computer and use it in GitHub Desktop.
Scrap Table of Contents from a Wiki page using Beautiful Soup
from bs4 import BeautifulSoup
import requests
url = 'https://en.wikipedia.org/wiki/Transhumanism'
# get contents from url
content = requests.get(url).content
# get soup
soup = BeautifulSoup(content,'lxml') # choose lxml parser
# find the tag : <div class="toc">
tag = soup.find('div', {'class' : 'toc'}) # id="toc" also works
# get all the links
links = tag.findAll('a') # <a href='/path/to/div'>topic</a>
# print them
for link in links:
print(link.text) # get text from <a>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment