Skip to content

Instantly share code, notes, and snippets.

@piEsposito
Created April 28, 2020 17:08
Show Gist options
  • Save piEsposito/2de42206926bd01176abc3f023af8947 to your computer and use it in GitHub Desktop.
Save piEsposito/2de42206926bd01176abc3f023af8947 to your computer and use it in GitHub Desktop.
#import
df = pd.read_csv('data/reviews.csv')
#drop useless data
df = df.drop(['Id', 'ProductId', 'UserId', 'ProfileName', 'HelpfulnessNumerator',
'HelpfulnessDenominator', 'Time', 'Summary',], axis=1)
#remove ambiguous 3 and 4 stars for balancing
df = df[df['Score'] != 3]
#create labels and preprocess
df['Score'] = df['Score'].apply(lambda i: 'positive' if i > 4 else 'negative')
df['Text'] = df['Text'].apply(lambda x:x.lower())
#set names for beautiful df
df.columns = ['labels', 'text']
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment