Skip to content

Instantly share code, notes, and snippets.

@jschaub30
Created July 13, 2015 16:26
Show Gist options
  • Save jschaub30/38692d0449f7842d9289 to your computer and use it in GitHub Desktop.
Save jschaub30/38692d0449f7842d9289 to your computer and use it in GitHub Desktop.
python script using requests and BeautifulSoup
#!/usr/bin/python
'''
Read wordpress post and parse out handle and email
Uses requests and BeautifulSoup
'''
import requests
from bs4 import BeautifulSoup
r = requests.get('http://arlab093.austin.ibm.com/blog/?p=2742')
soup = BeautifulSoup(r.text, 'html.parser')
content = soup.find_all('div', {'class':'postcontent'})[0]
links = content.find_all('a')
num_links = len(links)
assert num_links%2 == 0
result = []
for k in range(num_links/2):
handle = links[2 * k].string
email = links[2 * k + 1].string
result.append((handle, email))
for handle, email in result:
print '%s: %s' % (handle, email)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment