Skip to content

Instantly share code, notes, and snippets.

@aniruddha27
Created August 15, 2020 05:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save aniruddha27/8d112b87ff4014b80f606dc68080066d to your computer and use it in GitHub Desktop.
Save aniruddha27/8d112b87ff4014b80f606dc68080066d to your computer and use it in GitHub Desktop.
# Cleaning the tweets
def preprocess(tweet):
# remove links
tweet = re.sub(r'http\S+', '', tweet)
# remove mentions
tweet = re.sub("@\w+","",tweet)
# alphanumeric and hashtags
tweet = re.sub("[^a-zA-Z#]"," ",tweet)
# remove multiple spaces
tweet = re.sub("\s+"," ",tweet)
tweet = tweet.lower()
# Lemmatize
lemmatizer = WordNetLemmatizer()
sent = ' '.join([lemmatizer.lemmatize(w) for w in tweet.split() if len(lemmatizer.lemmatize(w))>3])
return sent
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment