Skip to content

Instantly share code, notes, and snippets.

@jonahlyn
Forked from braveulysses/strip_tags.py
Last active December 17, 2015 22:49
Show Gist options
  • Save jonahlyn/5684870 to your computer and use it in GitHub Desktop.
Save jonahlyn/5684870 to your computer and use it in GitHub Desktop.
Updated for BeautifulSoup4
from bs4 import BeautifulSoup
def strip(untrusted_html):
"""Strips out all tags from untrusted_html, leaving only text.
Converts XML entities to Unicode characters. This is desirable because it
reduces the likelihood that a filter further down the text processing chain
will double-encode the XML entities."""
ssoup = BeautifulSoup(untrusted_html, "xml")
safe_html = ''.join(soup.findAll(text=True))
return safe_html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment