Skip to content

Instantly share code, notes, and snippets.

@leonardopinho
Created August 21, 2018 18:20
Show Gist options
  • Save leonardopinho/6adcbd17f05092ae3362afd63a6563c1 to your computer and use it in GitHub Desktop.
Save leonardopinho/6adcbd17f05092ae3362afd63a6563c1 to your computer and use it in GitHub Desktop.
from bs4 import BeautifulSoup
def clean_html(html_text):
"""
Remove trash of raw text
:param html_text:
:return:
"""
text = BeautifulSoup(html_text, "lxml").text
return text
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment