Skip to content

Instantly share code, notes, and snippets.

@mmmayo13
Created March 23, 2018 15:41
Show Gist options
  • Save mmmayo13/a14fad3d4a3571bf91552146ea8cea6d to your computer and use it in GitHub Desktop.
Save mmmayo13/a14fad3d4a3571bf91552146ea8cea6d to your computer and use it in GitHub Desktop.
def strip_html(text):
soup = BeautifulSoup(text, "html.parser")
return soup.get_text()
def remove_between_square_brackets(text):
return re.sub('\[[^]]*\]', '', text)
def denoise_text(text):
text = strip_html(text)
text = remove_between_square_brackets(text)
return text
sample = denoise_text(sample)
print(sample)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment