Skip to content

Instantly share code, notes, and snippets.

@ianjsikes
Created May 3, 2019 20:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ianjsikes/631e295ef94aba7e8389d9db52285f98 to your computer and use it in GitHub Desktop.
Save ianjsikes/631e295ef94aba7e8389d9db52285f98 to your computer and use it in GitHub Desktop.
Deduplicates entries in an input JSON Lines file by the 'sellerId' property.
#!/bin/bash
in_file=$1
out_file=$2
if [ "$#" -ne 2 ]; then
if [ "$#" -ne 1 ]; then
echo "You must provide at least one file path as an argument."
echo "Usage: dedup <input_path> [output_path]"
echo "If output_path is ommitted, input file will be overwritten with output."
exit 1
fi
out_file=$1
fi
in_lines=`cat $in_file | wc -l | xargs`
echo "Processing $in_lines lines of input..."
cat $in_file | rq 'uniqBy "sellerId"' > $out_file
out_lines=`cat $out_file | wc -l | xargs`
dupe_items="$(($in_lines - $out_lines))"
echo "Removed $dupe_items duplicate items"
echo "Saved $out_lines unique items to $out_file"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment