Skip to content

Instantly share code, notes, and snippets.

@aniversarioperu
Created February 1, 2014 16:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save aniversarioperu/8754875 to your computer and use it in GitHub Desktop.
Save aniversarioperu/8754875 to your computer and use it in GitHub Desktop.
extraer todos los saludos firmados por el congresista Molina
# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import codecs
import glob
for filename in glob.glob("*html"):
f = codecs.open(filename, "r", "latin1")
html_doc = f.read()
f.close()
soup = BeautifulSoup(html_doc)
for tag in soup.find_all("td", width="150"):
if 'Molina' in tag.text:
prev = tag.previous_element
prev2 = prev.previous_element
prev3 = prev2.previous_element
print prev3.previous_element.encode("utf8")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment