Skip to content

Instantly share code, notes, and snippets.

@susanli2016
Created September 29, 2018 15:50
Show Gist options
  • Save susanli2016/98ffd9270878bc94f9e768ff0c6047ca to your computer and use it in GitHub Desktop.
Save susanli2016/98ffd9270878bc94f9e768ff0c6047ca to your computer and use it in GitHub Desktop.
import numpy as np
click_indices = us[us.click_bool == 1].index
random_indices = np.random.choice(click_indices, len(us.loc[us.click_bool == 1]), replace=False)
click_sample = us.loc[random_indices]
not_click = us[us.click_bool == 0].index
random_indices = np.random.choice(not_click, sum(us['click_bool']), replace=False)
not_click_sample = us.loc[random_indices]
us_new = pd.concat([not_click_sample, click_sample], axis=0)
print("Percentage of not click impressions: ", len(us_new[us_new.click_bool == 0])/len(us_new))
print("Percentage of click impression: ", len(us_new[us_new.click_bool == 1])/len(us_new))
print("Total number of records in resampled data: ", len(us_new))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment