Skip to content

Instantly share code, notes, and snippets.

@aaronkub
Last active March 8, 2019 20:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save aaronkub/257a1bd9215da3a7221148600d849450 to your computer and use it in GitHub Desktop.
Save aaronkub/257a1bd9215da3a7221148600d849450 to your computer and use it in GitHub Desktop.
import re
REPLACE_NO_SPACE = re.compile("[.;:!\'?,\"()\[\]]")
REPLACE_WITH_SPACE = re.compile("(<br\s*/><br\s*/>)|(\-)|(\/)")
def preprocess_reviews(reviews):
reviews = [REPLACE_NO_SPACE.sub("", line.lower()) for line in reviews]
reviews = [REPLACE_WITH_SPACE.sub(" ", line) for line in reviews]
return reviews
reviews_train_clean = preprocess_reviews(reviews_train)
reviews_test_clean = preprocess_reviews(reviews_test)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment