Skip to content

Instantly share code, notes, and snippets.

@amankharwal
Created August 19, 2020 11:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save amankharwal/2dd362d3515d21bf52db948fc008db1f to your computer and use it in GitHub Desktop.
Save amankharwal/2dd362d3515d21bf52db948fc008db1f to your computer and use it in GitHub Desktop.
import re
def clean_text(df, text_field):
df[text_field] = df[text_field].str.lower()
df[text_field] = df[text_field].apply(lambda elem: re.sub(r"(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)|^rt|http.+?", "", elem))
return df
test_clean = clean_text(test, "tweet")
train_clean = clean_text(train, "tweet")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment