Using jq
is great for examining JSON objects. You can extend its functionality with custom methods. The following is useful to understand at a high level the structure of arbitrary JSONs which is useful when trying to understand new data sources.
Requires jq
verison 1.5.
Add the following method to your ~/.jq
:
def profile_object:
to_entries | def parse_entry: {"key": .key, "value": .value | type}; map(parse_entry)
| sort_by(.key) | from_entries;
Here is a simple example JSON to profile:
{
"some-string": "alpha beta",
"some-number": 31415,
"some-array": ["a", "b", "c"],
"some-date": "1970-01-01T00:00:00",
"some-object": { "ssn": "yeah right" }
}
Assuming that the above example has been written to a file called /tmp/foo.json
. Doing the following:
cat /tmp/foo.json | jq "profile_object"
Will produce:
{
"some-array": "array",
"some-date": "string",
"some-number": "number",
"some-object": "object",
"some-string": "string"
}
Here is a simple example array of JSON objects to be used for both cases:
[
{
"some-string": "alpha beta",
"some-number": 31415,
"some-array": ["a", "b", "c"],
"some-date": "1970-01-01T00:00:00",
"some-object": { "ssn": "yeah right" }
},
{
"some-string": "alpha beta",
"some-number": 31415,
"some-array": ["a", "b", "c"],
"some-date": "1970-01-01T00:00:00",
"some-object": { "ssn": "yeah right" }
}
]
Add the following method to your ~/.jq
in addition to the function defined above:
def profile_array_objects:
map(profile_object) | map(to_entries) | reduce .[] as $item ([]; . + $item) | sort_by(.key) | from_entries;
Assuming that the above example has been written to a file called /tmp/foo.json
. Doing the following:
cat /tmp/foo.json | jq "profile_array_objects"
Will produce:
{
"some-array": "array",
"some-date": "string",
"some-number": "number",
"some-object": "object",
"some-string": "string"
}
Add the following method to your ~/.jq
in addition to the function defined above:
def profile_array_objects_with_freq:
map(profile_object) | map(to_entries) | flatten | group_by(.key)
| def create_profile_entry:
{"key": .[0] | .key, "value": { "count": . | length, "type": .[0] | .value }};
map(create_profile_entry) | from_entries;
Assuming that the above example has been written to a file called /tmp/foo.json
. Doing the following:
cat /tmp/foo.json | jq "profile_array_objects_with_freq"
Will produce:
{
"some-array": {
"count": 2,
"type": "array"
},
"some-date": {
"count": 2,
"type": "string"
},
"some-number": {
"count": 2,
"type": "number"
},
"some-object": {
"count": 2,
"type": "object"
},
"some-string": {
"count": 2,
"type": "string"
}
}
@mikehwang - I've developed a similar "schema inference engine" that infers a simple structural schema (SSS) from one or more JSON documents; a full-fledged schema language (JESS) that extends the SSS language, and a verification tool that you might be interested in.
An example of how everything fits together using a NASA JSON data set is at https://bitbucket.org/pkoppstein/jess/wiki/Case%20Study%20-%20near_earth_asteroids.json:%20from%20inference%20to%20verification
Some other links:
Your feedback would be appreciated!