Skip to content

Instantly share code, notes, and snippets.

@danielrobertson
Created March 19, 2015 17:48
Show Gist options
  • Save danielrobertson/8d92a8a547dcc5d59155 to your computer and use it in GitHub Desktop.
Save danielrobertson/8d92a8a547dcc5d59155 to your computer and use it in GitHub Desktop.
Remove duplicates in CSV
import pandas as pd
toclean = pd.read_csv('fileWithDuplicates.csv')
deduped = toclean.drop_duplicates('columnName')
deduped.to_csv('fileWithoutDuplicates.csv')
@danielrobertson
Copy link
Author

For xlsx files, use ssconvert to convert them to csv's first:
bash ssconvert myFile.xlsx myNewFile.csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment