Skip to content

Instantly share code, notes, and snippets.

@gautham20
Last active October 14, 2020 16:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gautham20/4d73ea40ba7535bbc0a64a31edc5d748 to your computer and use it in GitHub Desktop.
Save gautham20/4d73ea40ba7535bbc0a64a31edc5d748 to your computer and use it in GitHub Desktop.
Stratified sampling of data in such way that the distribution of the grouped column in the sample is almost same as in original data
# Stratified sampling of data in such way that the distribution of the grouped column in the sample
# is almost same as in original data
def group_sampler(group_data, total_df_len, n_samples):
return group_data.sample(n=int(np.ceil((len(group_data)/ total_df_len)*n_samples)))
group_sampler_200 = partial(group_sampler, total_df_len=len(filtered_cells), n_samples=200)
filtered_200_cells = filtered_cells.groupby('group_column', as_index=False).apply(cell_group_sampler_200)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment