Last active
February 12, 2019 16:22
-
-
Save brew/813706202385c79009e18e39dc39b0ed to your computer and use it in GitHub Desktop.
Aggregates values for rows with duplicate index keys
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas | |
value_keys = [ | |
"MONTO_APROBADO", | |
"MONTO_MODIFICADO", | |
"MONTO_EJERCIDO" | |
] | |
FILENAME = 'OS_Presupuesto_2018CDMX' | |
# Make sure OS_Presupuesto_2018CDMX.csv is saved as utf-8. | |
# Assign str type to all columns | |
df = pandas.read_csv('{}.csv'.format(FILENAME), encoding='utf-8', dtype=str) | |
# Assign float to value columns | |
for k in value_keys: | |
df[k] = df[k].astype(float) | |
index_keys = [k for k in df.keys().values if k not in value_keys] | |
# We have to fill empty cells, otherwise they are NaN which will be ignored by | |
# the df.groupby. | |
df = df.fillna('') | |
# Retain first duplicate row, then drop the rest. | |
# df.drop_duplicates(inplace=True, subset=index_keys) | |
# Or, separately sum value columns for duplicates. | |
df = df.groupby(index_keys, as_index=False)[value_keys].agg('sum').round(2) | |
df.to_csv('{}.aggduped.csv'.format(FILENAME), index=False) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment