Skip to content

Instantly share code, notes, and snippets.

@suriyadeepan
Created August 26, 2016 06:20
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save suriyadeepan/1361cf391517aeb11d271b97b15f7fc0 to your computer and use it in GitHub Desktop.
Save suriyadeepan/1361cf391517aeb11d271b97b15f7fc0 to your computer and use it in GitHub Desktop.
Scrap all references from a wiki page using Beautiful Soup
from bs4 import BeautifulSoup
import requests
url = 'https://en.wikipedia.org/wiki/Transhumanism'
# get contents from url
content = requests.get(url).content
# get soup
soup = BeautifulSoup(content,'lxml') # choose lxml parser
# find all the references
ref_tags = soup.findAll('span', { 'class' : 'reference-text' })
# iterate through the ResultSet
for i,ref_tag in enumerate(ref_tags):
# print text only
print('[{0}] {1}'.format(i,ref_tag.text))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment