Skip to content

Instantly share code, notes, and snippets.

@p53ud0k0d3
Created April 6, 2017 14:11
Show Gist options
  • Save p53ud0k0d3/32718d793aaffdddbbb8d1f90da4201f to your computer and use it in GitHub Desktop.
Save p53ud0k0d3/32718d793aaffdddbbb8d1f90da4201f to your computer and use it in GitHub Desktop.
How to cleanup tweets
#tweet is a single tweet
def clean_tweet(tweet):
tweet = re.sub("https?\:\/\/", "", tweet) #links
tweet = re.sub("#\S+", "", tweet) #hashtags
tweet = re.sub("\.?@", "", tweet) #at mentions
tweet = re.sub("RT.+", "", tweet) #Retweets
tweet = re.sub("Video\:", "", tweet) #Videos
tweet = re.sub("\n", "", tweet) #new lines
tweet = re.sub("^\.\s.", "", tweet) #leading whitespace
tweet = re.sub("\s+", " ", tweet) #extra whitespace
tweet = re.sub("&", "and", tweet) #encoded ampersands
return tweet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment