Skip to content

Instantly share code, notes, and snippets.

@jcbozonier
Created May 22, 2017 12:28
Show Gist options
  • Save jcbozonier/64a862bf7dbe93dadcdcabdddeea2153 to your computer and use it in GitHub Desktop.
Save jcbozonier/64a862bf7dbe93dadcdcabdddeea2153 to your computer and use it in GitHub Desktop.
Creating indexes for vectorization
# Index each of the unique variable values
degree_index = salary_df.groupby('degree').all().reset_index().reset_index()[['index', 'degree']]
degree_state_index = salary_df.groupby(['degree', 'state']).all().reset_index().reset_index()[['index', 'degree', 'state']]
degree_state_county_index = salary_df.groupby(['degree', 'state', 'county']).all().reset_index().reset_index()[['index', 'degree', 'state', 'county']]
degree_state_indexes_df = pd.merge(degree_index, degree_state_index, how='inner', on='degree', suffixes=('_d', '_ds'))
degree_state_county_indexes_df = pd.merge(degree_state_indexes_df, degree_state_county_index, how='inner', on=['degree', 'state'])
indexed_salary_df = pd.merge(salary_df, degree_state_county_indexes_df, how='inner', on=['degree', 'state', 'county']).reset_index()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment