Skip to content

Instantly share code, notes, and snippets.

@edison12a
Forked from bradmontgomery/kill_attrs.py
Created April 3, 2018 13:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save edison12a/e974468ea384c615fba1c24c936fd19c to your computer and use it in GitHub Desktop.
Save edison12a/e974468ea384c615fba1c24c936fd19c to your computer and use it in GitHub Desktop.
A way to remove all HTML attributes with BeautifulSoup
from BeautifulSoup import BeautifulSoup
def _remove_attrs(soup):
for tag in soup.findAll(True):
tag.attrs = None
return soup
def example():
doc = '<html><head><title>test</title></head><body id="foo" onload="whatever"><p class="whatever">junk</p><div style="background: yellow;" id="foo" class="blah">blah</div></body></html>'
print 'Before:\n%s' % doc
soup = BeautifulSoup(doc)
clean_soup = _remove_attrs(soup)
print '\nAfter:\n%s' % clean_soup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment