Skip to content

Instantly share code, notes, and snippets.

@jazzwang
Forked from mikehwang/jq-profilejsonschema.md
Created October 4, 2022 00:37
Show Gist options
  • Save jazzwang/150351b67f318cb45974aca0e9b950ec to your computer and use it in GitHub Desktop.
Save jazzwang/150351b67f318cb45974aca0e9b950ec to your computer and use it in GitHub Desktop.
Use jq to profile the schema of a given JSON object or an array of JSONs objects

Profile JSON schema

Using jq is great for examining JSON objects. You can extend its functionality with custom methods. The following is useful to understand at a high level the structure of arbitrary JSONs which is useful when trying to understand new data sources.

Requires jq verison 1.5.

Profile an object

Add the following method to your ~/.jq:

def profile_object:
    to_entries | def parse_entry: {"key": .key, "value": .value | type}; map(parse_entry)
        | sort_by(.key) | from_entries;

Here is a simple example JSON to profile:

{
    "some-string": "alpha beta",
    "some-number": 31415,
    "some-array": ["a", "b", "c"],
    "some-date": "1970-01-01T00:00:00",
    "some-object": { "ssn": "yeah right" }
}

Assuming that the above example has been written to a file called /tmp/foo.json. Doing the following:

cat /tmp/foo.json | jq "profile_object"

Will produce:

{
  "some-array": "array",
  "some-date": "string",
  "some-number": "number",
  "some-object": "object",
  "some-string": "string"
}

Profile an array of objects

Here is a simple example array of JSON objects to be used for both cases:

[
    {
        "some-string": "alpha beta",
        "some-number": 31415,
        "some-array": ["a", "b", "c"],
        "some-date": "1970-01-01T00:00:00",
        "some-object": { "ssn": "yeah right" }
    },
    {
        "some-string": "alpha beta",
        "some-number": 31415,
        "some-array": ["a", "b", "c"],
        "some-date": "1970-01-01T00:00:00",
        "some-object": { "ssn": "yeah right" }
    }
]

By data type

Add the following method to your ~/.jq in addition to the function defined above:

def profile_array_objects:
    map(profile_object) | map(to_entries) | reduce .[] as $item ([]; . + $item) | sort_by(.key) | from_entries;

Assuming that the above example has been written to a file called /tmp/foo.json. Doing the following:

cat /tmp/foo.json | jq "profile_array_objects"

Will produce:

{
  "some-array": "array",
  "some-date": "string",
  "some-number": "number",
  "some-object": "object",
  "some-string": "string"
}

By data type and frequency

Add the following method to your ~/.jq in addition to the function defined above:

def profile_array_objects_with_freq:
    map(profile_object) | map(to_entries) | flatten | group_by(.key)
        | def create_profile_entry:
            {"key": .[0] | .key, "value": { "count": . | length, "type": .[0] | .value }};
        map(create_profile_entry) | from_entries;

Assuming that the above example has been written to a file called /tmp/foo.json. Doing the following:

cat /tmp/foo.json | jq "profile_array_objects_with_freq"

Will produce:

{
  "some-array": {
    "count": 2,
    "type": "array"
  },
  "some-date": {
    "count": 2,
    "type": "string"
  },
  "some-number": {
    "count": 2,
    "type": "number"
  },
  "some-object": {
    "count": 2,
    "type": "object"
  },
  "some-string": {
    "count": 2,
    "type": "string"
  }
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment