Skip to content

Instantly share code, notes, and snippets.

@cwilper
Last active June 9, 2020 16:40
Show Gist options
  • Save cwilper/2e192e3ea15ca689c4d891f4f9c77c26 to your computer and use it in GitHub Desktop.
Save cwilper/2e192e3ea15ca689c4d891f4f9c77c26 to your computer and use it in GitHub Desktop.
#!/usr/bin/env bash
#
# Reads https://raw.githubusercontent.com/2020PB/police-brutality/data_build/all-locations.json
# and prints a json report including the total number of records, the list of duplicate ids, if any,
# and the list of records with missing ids, if any.
#
# Requires: jq, curl
#
# Usage: ./id-report.sh [file]
#
# If no file is given, the latest copy of all-locations.json on github will be downloaded and used.
#
if [[ $1 ]]; then
[[ -f $1 ]] || echo "Error: No such file: $1" && exit 1
json=$(cat $1)
else
json=$(curl -sL "https://raw.githubusercontent.com/2020PB/police-brutality/data_build/all-locations.json")
fi
count_expr='.data | length'
dupe_expr='[.data[] | .id] | group_by(.) | map(select(length>1) | .[0])'
missing_expr='[.data[] | select(has("id") | not)]'
echo "$json" | jq "{record_count: $count_expr, duplicate_ids: $dupe_expr, records_missing_ids: $missing_expr}"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment