Skip to content

Instantly share code, notes, and snippets.

@intrd
Last active March 3, 2017 05:59
Show Gist options
  • Save intrd/2c19ee22c3bd98ed07612c6ad3adc25d to your computer and use it in GitHub Desktop.
Save intrd/2c19ee22c3bd98ed07612c6ad3adc25d to your computer and use it in GitHub Desktop.
Wikipedia parser (birth data extractor)
## Wikipedia parser (birth data extractor)
# @author intrd - http://dann.com.br/ (based on @JBernardo's suggestion http://stackoverflow.com/a/12250675)
# @license Creative Commons Attribution-ShareAlike 4.0 International License - http://creativecommons.org/licenses/by-sa/4.0/
import re, requests
from bs4 import BeautifulSoup
def wikiGet(name):
url = 'http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&rvsection=0&titles='+name+'&format=xml'
res = requests.get(url)
soup = BeautifulSoup(res.text, "xml")
#print soup.getText()
birth_re = re.search(r'(Birth date(.*?)}})', soup.revisions.getText())
birth_data = birth_re.group(0).split('|')
print birth_data
print len(birth_data[2])
if len(birth_data[2]) == 4:
return birth_data[2]
else:
return birth_data[1]
#dyear = wikiGet("Albert_Einstein")
#dyear = wikiGet("Daniel_Bleichenbacher")
#print dyear
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment