Skip to content

Instantly share code, notes, and snippets.

@igorsol
Last active June 12, 2021 13:40
Show Gist options
  • Save igorsol/5afae9fac70a799c70be6b5e22c85ab4 to your computer and use it in GitHub Desktop.
Save igorsol/5afae9fac70a799c70be6b5e22c85ab4 to your computer and use it in GitHub Desktop.
json filtering with jq utility

So I suddenly discovered that compile_commands.json files generated by the Bear tool contain duplicate entries despite the fact Bear's documentation says it should filter out duplicates. Probably that was result of the outdated (2 years old) version of the Bear - now I updated it but I had no time to check if this issue is already fixed.

How to filter out duplicate entries from a JSON array with thousands of elements? And preferably leave only the last element of the group because it contains the most up-to-date compilation parameters?

One way is just remove compile_commands.json and rebuild whole project - but that could be very tedious because on my machine it takes more than an hour to rebuild the project which I work on. And I have a dosen of different versions in separate directories... After some thinknig I recalled a tool called jq which is 'JSON query'. I had some doubts if I will be able to do my specific task with this tool so I went to documentation page: https://stedolan.github.io/jq/manual/

After many unsuccessful attempts I found correct syntax to filter out duplicate queries:

cat compile_commands.json |jq '[group_by(.file)[]|last]' >compile_commands_reduced.json

Some files where reduced from ~20000 records to less than 2000 records! To check number of records I used following syntax:

# length of the original file
cat compile_commands.json |jq 'length'
# length of the reduced file
cat compile_commands.json |jq '[group_by(.file)[]|last]|length'

After doing this I am really impressed by the possibilities of the jq tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment