mikehwang/jq-profilejsonschema.md

## jq-profilejsonschema.md

      
    Raw
  

              jq-profilejsonschema.md
            
          
    Profile JSON schema

Using jq is great for examining JSON objects.  You can extend its functionality with custom methods.  The following is useful to understand at a high level the structure of arbitrary JSONs which is useful when trying to understand new data sources.
Requires jq verison 1.5.
Profile an object

Add the following method to your ~/.jq:
def profile_object:
    to_entries | def parse_entry: {"key": .key, "value": .value | type}; map(parse_entry)
        | sort_by(.key) | from_entries;

Here is a simple example JSON to profile:
{
    "some-string": "alpha beta",
    "some-number": 31415,
    "some-array": ["a", "b", "c"],
    "some-date": "1970-01-01T00:00:00",
    "some-object": { "ssn": "yeah right" }
}

Assuming that the above example has been written to a file called /tmp/foo.json.  Doing the following:
cat /tmp/foo.json | jq "profile_object"

Will produce:
{
  "some-array": "array",
  "some-date": "string",
  "some-number": "number",
  "some-object": "object",
  "some-string": "string"
}

Profile an array of objects

Here is a simple example array of JSON objects to be used for both cases:
[
    {
        "some-string": "alpha beta",
        "some-number": 31415,
        "some-array": ["a", "b", "c"],
        "some-date": "1970-01-01T00:00:00",
        "some-object": { "ssn": "yeah right" }
    },
    {
        "some-string": "alpha beta",
        "some-number": 31415,
        "some-array": ["a", "b", "c"],
        "some-date": "1970-01-01T00:00:00",
        "some-object": { "ssn": "yeah right" }
    }
]

By data type

Add the following method to your ~/.jq in addition to the function defined above:
def profile_array_objects:
    map(profile_object) | map(to_entries) | reduce .[] as $item ([]; . + $item) | sort_by(.key) | from_entries;

Assuming that the above example has been written to a file called /tmp/foo.json.  Doing the following:
cat /tmp/foo.json | jq "profile_array_objects"

Will produce:
{
  "some-array": "array",
  "some-date": "string",
  "some-number": "number",
  "some-object": "object",
  "some-string": "string"
}

By data type and frequency

Add the following method to your ~/.jq in addition to the function defined above:
def profile_array_objects_with_freq:
    map(profile_object) | map(to_entries) | flatten | group_by(.key)
        | def create_profile_entry:
            {"key": .[0] | .key, "value": { "count": . | length, "type": .[0] | .value }};
        map(create_profile_entry) | from_entries;


Assuming that the above example has been written to a file called /tmp/foo.json.  Doing the following:
cat /tmp/foo.json | jq "profile_array_objects_with_freq"

Will produce:
{
  "some-array": {
    "count": 2,
    "type": "array"
  },
  "some-date": {
    "count": 2,
    "type": "string"
  },
  "some-number": {
    "count": 2,
    "type": "number"
  },
  "some-object": {
    "count": 2,
    "type": "object"
  },
  "some-string": {
    "count": 2,
    "type": "string"
  }
}