Skip to content

Instantly share code, notes, and snippets.

@mikehwang
Last active April 7, 2024 17:55
Show Gist options
  • Save mikehwang/6ed95480579ac0b9fd72bff340d99a18 to your computer and use it in GitHub Desktop.
Save mikehwang/6ed95480579ac0b9fd72bff340d99a18 to your computer and use it in GitHub Desktop.
Use jq to profile the schema of a given JSON object or an array of JSONs objects

Profile JSON schema

Using jq is great for examining JSON objects. You can extend its functionality with custom methods. The following is useful to understand at a high level the structure of arbitrary JSONs which is useful when trying to understand new data sources.

Requires jq verison 1.5.

Profile an object

Add the following method to your ~/.jq:

def profile_object:
    to_entries | def parse_entry: {"key": .key, "value": .value | type}; map(parse_entry)
        | sort_by(.key) | from_entries;

Here is a simple example JSON to profile:

{
    "some-string": "alpha beta",
    "some-number": 31415,
    "some-array": ["a", "b", "c"],
    "some-date": "1970-01-01T00:00:00",
    "some-object": { "ssn": "yeah right" }
}

Assuming that the above example has been written to a file called /tmp/foo.json. Doing the following:

cat /tmp/foo.json | jq "profile_object"

Will produce:

{
  "some-array": "array",
  "some-date": "string",
  "some-number": "number",
  "some-object": "object",
  "some-string": "string"
}

Profile an array of objects

Here is a simple example array of JSON objects to be used for both cases:

[
    {
        "some-string": "alpha beta",
        "some-number": 31415,
        "some-array": ["a", "b", "c"],
        "some-date": "1970-01-01T00:00:00",
        "some-object": { "ssn": "yeah right" }
    },
    {
        "some-string": "alpha beta",
        "some-number": 31415,
        "some-array": ["a", "b", "c"],
        "some-date": "1970-01-01T00:00:00",
        "some-object": { "ssn": "yeah right" }
    }
]

By data type

Add the following method to your ~/.jq in addition to the function defined above:

def profile_array_objects:
    map(profile_object) | map(to_entries) | reduce .[] as $item ([]; . + $item) | sort_by(.key) | from_entries;

Assuming that the above example has been written to a file called /tmp/foo.json. Doing the following:

cat /tmp/foo.json | jq "profile_array_objects"

Will produce:

{
  "some-array": "array",
  "some-date": "string",
  "some-number": "number",
  "some-object": "object",
  "some-string": "string"
}

By data type and frequency

Add the following method to your ~/.jq in addition to the function defined above:

def profile_array_objects_with_freq:
    map(profile_object) | map(to_entries) | flatten | group_by(.key)
        | def create_profile_entry:
            {"key": .[0] | .key, "value": { "count": . | length, "type": .[0] | .value }};
        map(create_profile_entry) | from_entries;

Assuming that the above example has been written to a file called /tmp/foo.json. Doing the following:

cat /tmp/foo.json | jq "profile_array_objects_with_freq"

Will produce:

{
  "some-array": {
    "count": 2,
    "type": "array"
  },
  "some-date": {
    "count": 2,
    "type": "string"
  },
  "some-number": {
    "count": 2,
    "type": "number"
  },
  "some-object": {
    "count": 2,
    "type": "object"
  },
  "some-string": {
    "count": 2,
    "type": "string"
  }
}
@pkoppstein
Copy link

pkoppstein commented Aug 3, 2019

@mikehwang - I've developed a similar "schema inference engine" that infers a simple structural schema (SSS) from one or more JSON documents; a full-fledged schema language (JESS) that extends the SSS language, and a verification tool that you might be interested in.

An example of how everything fits together using a NASA JSON data set is at https://bitbucket.org/pkoppstein/jess/wiki/Case%20Study%20-%20near_earth_asteroids.json:%20from%20inference%20to%20verification

Some other links:

Your feedback would be appreciated!

@gust-p
Copy link

gust-p commented Mar 5, 2024

I didn't even know that this was possible with jq, it never stop impressing me, thanks for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment