Skip to content

Instantly share code, notes, and snippets.

@dalequark
Created January 29, 2019 17:45
Show Gist options
  • Save dalequark/6721501cea7d9096c9d8faee7ac435fa to your computer and use it in GitHub Desktop.
Save dalequark/6721501cea7d9096c9d8faee7ac435fa to your computer and use it in GitHub Desktop.
Bucket Politeness Data
# We'll consider everything with a Normalized Score below the 25th percentile to be rude.
# Above the 75th percentile is considered polite.
# Scores in the middle are considered neutral
# Get the 25th and 75th percentiles of Normalized Scoes
rude_thresh = data.describe().loc["25%"]['Normalized Score']
polite_thresh = data.describe().loc["75%"]['Normalized Score']
label_list = ["rude", "neutral", "polite"]
def score_to_label(score):
if score <= rude_thresh:
return label_list[0]
if score < polite_thresh:
return label_list[1]
return label_list[2]
# Give the 'data' DataFrame a new column that contains string labels
data['label'] = data['Normalized Score'].map(score_to_label)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment