Skip to content

Instantly share code, notes, and snippets.

@braveulysses
Created May 29, 2009 20:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save braveulysses/120191 to your computer and use it in GitHub Desktop.
Save braveulysses/120191 to your computer and use it in GitHub Desktop.
Strip HTML tags using BeautifulSoup
def strip(untrusted_html):
"""Strips out all tags from untrusted_html, leaving only text.
Converts XML entities to Unicode characters. This is desirable because it
reduces the likelihood that a filter further down the text processing chain
will double-encode the XML entities."""
soup = BeautifulStoneSoup(untrusted_html, convertEntities=BeautifulStoneSoup.ALL_ENTITIES)
safe_html = ''.join(soup.findAll(text=True))
return safe_html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment