Skip to content

Instantly share code, notes, and snippets.

@robsmith1776
Forked from bradmontgomery/kill_attrs.py
Created November 19, 2015 16:36
Show Gist options
  • Save robsmith1776/c379e3643bf4476a19fa to your computer and use it in GitHub Desktop.
Save robsmith1776/c379e3643bf4476a19fa to your computer and use it in GitHub Desktop.
A way to remove all HTML attributes with BeautifulSoup
from BeautifulSoup import BeautifulSoup
def _remove_attrs(soup):
for tag in soup.findAll(True):
tag.attrs = None
return soup
def example():
doc = '<html><head><title>test</title></head><body id="foo" onload="whatever"><p class="whatever">junk</p><div style="background: yellow;" id="foo" class="blah">blah</div></body></html>'
print 'Before:\n%s' % doc
soup = BeautifulSoup(doc)
clean_soup = _remove_attrs(soup)
print '\nAfter:\n%s' % clean_soup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment