Created
March 23, 2018 15:41
-
-
Save mmmayo13/a14fad3d4a3571bf91552146ea8cea6d to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def strip_html(text): | |
soup = BeautifulSoup(text, "html.parser") | |
return soup.get_text() | |
def remove_between_square_brackets(text): | |
return re.sub('\[[^]]*\]', '', text) | |
def denoise_text(text): | |
text = strip_html(text) | |
text = remove_between_square_brackets(text) | |
return text | |
sample = denoise_text(sample) | |
print(sample) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment