Skip to content

Instantly share code, notes, and snippets.

@ViktorNova
Created April 21, 2016 23:54
Show Gist options
  • Save ViktorNova/9f188ccd5b9008d31945fdce72a36363 to your computer and use it in GitHub Desktop.
Save ViktorNova/9f188ccd5b9008d31945fdce72a36363 to your computer and use it in GitHub Desktop.
This cleans up CSVs by removing duplicates based on a column, removes all double-quoted commas, and all doublequotes
#!/bin/bash
# This cleans up CSVs
echo "Removing all commas between quotes" #Quick and dirty
echo "Then removing quotes"
echo "Lastly, we remove duplicates based on Column 7 - Phone number"
cat $1 | sed ':a;s/^\(\([^"]*,\?\|"[^",]*",\?\)*"[^",]*\),/\1 /;ta' | \
sed 's/\"//g' | \
sort -t ',' -k 7,7 -u > $1-OUTPUT.csv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment