Skip to content

Instantly share code, notes, and snippets.

@amankharwal
Created January 11, 2021 06:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save amankharwal/dcc25782f24360c1ec3cc8d8cd0c5e0a to your computer and use it in GitHub Desktop.
Save amankharwal/dcc25782f24360c1ec3cc8d8cd0c5e0a to your computer and use it in GitHub Desktop.
f_data = pd.read_csv('vaccination_tweets.csv')
f_data.text =f_data.text.str.lower()
#Remove twitter handlers
f_data.text = f_data.text.apply(lambda x:re.sub('@[^\s]+','',x))
#remove hashtags
f_data.text = f_data.text.apply(lambda x:re.sub(r'\B#\S+','',x))
# Remove URLS
f_data.text = f_data.text.apply(lambda x:re.sub(r"http\S+", "", x))
# Remove all the special characters
f_data.text = f_data.text.apply(lambda x:' '.join(re.findall(r'\w+', x)))
#remove all single characters
f_data.text = f_data.text.apply(lambda x:re.sub(r'\s+[a-zA-Z]\s+', '', x))
# Substituting multiple spaces with single space
f_data.text = f_data.text.apply(lambda x:re.sub(r'\s+', ' ', x, flags=re.I))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment