Skip to content

Instantly share code, notes, and snippets.

@brew
Last active February 12, 2019 16:22
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save brew/813706202385c79009e18e39dc39b0ed to your computer and use it in GitHub Desktop.
Save brew/813706202385c79009e18e39dc39b0ed to your computer and use it in GitHub Desktop.
Aggregates values for rows with duplicate index keys
import pandas
value_keys = [
"MONTO_APROBADO",
"MONTO_MODIFICADO",
"MONTO_EJERCIDO"
]
FILENAME = 'OS_Presupuesto_2018CDMX'
# Make sure OS_Presupuesto_2018CDMX.csv is saved as utf-8.
# Assign str type to all columns
df = pandas.read_csv('{}.csv'.format(FILENAME), encoding='utf-8', dtype=str)
# Assign float to value columns
for k in value_keys:
df[k] = df[k].astype(float)
index_keys = [k for k in df.keys().values if k not in value_keys]
# We have to fill empty cells, otherwise they are NaN which will be ignored by
# the df.groupby.
df = df.fillna('')
# Retain first duplicate row, then drop the rest.
# df.drop_duplicates(inplace=True, subset=index_keys)
# Or, separately sum value columns for duplicates.
df = df.groupby(index_keys, as_index=False)[value_keys].agg('sum').round(2)
df.to_csv('{}.aggduped.csv'.format(FILENAME), index=False)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment