Skip to content

Instantly share code, notes, and snippets.

@wingkwong
Created November 17, 2019 13:34
Show Gist options
  • Save wingkwong/c7fbf5de5180cba5796a6471cb0d6179 to your computer and use it in GitHub Desktop.
Save wingkwong/c7fbf5de5180cba5796a6471cb0d6179 to your computer and use it in GitHub Desktop.
Removing duplicate records in a CSV file using Pandas
import pandas as pd
d = pd.read_csv('CSV_FILE.csv', keep_default_na = False)
d.drop_duplicates(subset = ['COMPOSITE_KEY1', 'COMPOSITE_KEY2', 'COMPOSITE_KEY3', 'COMPOSITE_KEY4', 'COMPOSITE_KEY5', 'COMPOSITE_KEY6', 'COMPOSITE_KEY7', 'COMPOSITE_KEY8', 'COMPOSITE_KEY9', 'COMPOSITE_KEY10'], inplace = True, keep = 'first')
d.to_csv('CSV_FILE_PROCESSED.csv', index = False)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment