Skip to content

Instantly share code, notes, and snippets.

@amankharwal
Created Aug 19, 2020
Embed
What would you like to do?
import re
def clean_text(df, text_field):
df[text_field] = df[text_field].str.lower()
df[text_field] = df[text_field].apply(lambda elem: re.sub(r"(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)|^rt|http.+?", "", elem))
return df
test_clean = clean_text(test, "tweet")
train_clean = clean_text(train, "tweet")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment