Skip to content

Instantly share code, notes, and snippets.

@alejio
Created July 13, 2016 19:07
Show Gist options
  • Save alejio/494a284b57edc30fdee8a2f45e54395f to your computer and use it in GitHub Desktop.
Save alejio/494a284b57edc30fdee8a2f45e54395f to your computer and use it in GitHub Desktop.
Python, Pandas: Use MC method to keep random subset of records within different values of the primary key
df_out = pd.DataFrame(columns=df.columns)
df_temp = pd.DataFrame(columns=df.columns)
for elem in a_list:
df_temp = df[df.pkey==elem]
if len(df_temp) > a_number:
df_temp_ind = df_temp.index.map(lambda x: x if np.random.binomial(1, prob_keep)==1 else None)
df_temp = df_temp.loc[df_temp_ind,].dropna()
else:
pass
df_out = df_out.append(df_temp, ignore_index=True)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment