Created
April 6, 2017 14:11
-
-
Save p53ud0k0d3/32718d793aaffdddbbb8d1f90da4201f to your computer and use it in GitHub Desktop.
How to cleanup tweets
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#tweet is a single tweet | |
def clean_tweet(tweet): | |
tweet = re.sub("https?\:\/\/", "", tweet) #links | |
tweet = re.sub("#\S+", "", tweet) #hashtags | |
tweet = re.sub("\.?@", "", tweet) #at mentions | |
tweet = re.sub("RT.+", "", tweet) #Retweets | |
tweet = re.sub("Video\:", "", tweet) #Videos | |
tweet = re.sub("\n", "", tweet) #new lines | |
tweet = re.sub("^\.\s.", "", tweet) #leading whitespace | |
tweet = re.sub("\s+", " ", tweet) #extra whitespace | |
tweet = re.sub("&", "and", tweet) #encoded ampersands | |
return tweet |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment