Skip to content

Instantly share code, notes, and snippets.

@brew
Created February 6, 2019 12:16
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save brew/7d3297eafdbb728cb86aef97f5ce4652 to your computer and use it in GitHub Desktop.
Save brew/7d3297eafdbb728cb86aef97f5ce4652 to your computer and use it in GitHub Desktop.
Remove duplicate rows from source file based on subset of columns
import pandas
df = pandas.read_csv('OS_Presupuesto_2018CDMX.csv', encoding='latin_1')
df.drop_duplicates(inplace=True, subset=[
"CICLO",
"GOBIERNO_GEN",
"SECTOR",
"SUBSECTOR",
"UNIDAD_RESPONSABLE",
"FINALIDAD",
"FUNCION",
"SUBFUNCION",
"AREA_FUNCIONAL",
"PROGRAMA_PRESUPUESTARIO",
"FUENTE_FINANCIAMIENTO",
"ORIGEN_RECURSO",
"CAPITULO",
"CONCEPTO",
"PARTIDA_GEN",
"PARTIDA_ESP",
"TIPO_GASTO",
"PROYECTO_INV"]
)
df.to_csv('OS_Presupuesto_2018CDMX.deduped.csv', index=False)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment