Skip to content

Instantly share code, notes, and snippets.

@clemensv
Created July 31, 2023 08:38
Show Gist options
  • Save clemensv/27f993dc0cd6388d7dbd99488346b232 to your computer and use it in GitHub Desktop.
Save clemensv/27f993dc0cd6388d7dbd99488346b232 to your computer and use it in GitHub Desktop.
Resolving Competing, Paradoxical Validation/Code-Gen Constraints in JSON Schema

Resolving Paradoxical Constraints in JSON Schema

Introduction

JSON Schema is a versatile tool for defining the structure of JSON data and ensuring its validation. However, as powerful as it is, complex scenarios can sometimes lead to paradoxical constraints, especially when used in combination with code generation tools. In this article, we'll take an in-depth look at one such paradox that emerged while defining message structures for various protocols and discuss a practical solution.

The Problem Statement

Imagine a system where messages are passed using different protocols, such as AMQP, HTTP, MQTT, Kafka, and CloudEvents. Each protocol has a distinct message structure but shares certain common attributes. These shared attributes are consolidated in a base message definition, called definition in our JSON Schema.

The definition schema could look something like this:

"definition": {
  "type": "object",
  "description": "a message definition",
  "properties": {
    "schemaUrl": {
      "type": "string",
      "description": "A URL to the schema of the message's data.",
      "format": "uri-reference"
    },
    "schemaFormat": {
      "type": "string",
      "description": "Declares the schema format"
    },
    "format": {
      "type": "string",
      "description": "Specifies the `format` of this definition."
    },
    "metadata": {
      "type": "object"
    }
  },
  "required": [
    "format"
  ],
  ...
}

Protocol-specific definitions, such as amqpDefinition, mqttDefinition, etc., are created, inheriting properties from the base definition via the allOf keyword:

"amqpDefinition": {
  "type": "object",
  "properties": {
    "metadata": {
      "description": "AMQP message metadata",
      "$ref": "#/definitions/amqpMetadata"
    },
    "format": {
      "type": "string",
      "description": "Specifies the `format` of this definition.",
      "enum": ["AMQP", "AMQP/1.0"]
    }
  },
  "required": [
    "metadata", "format"
  ],
  "allOf": [
    {
      "$ref": "...#/definitions/definition"
    }
  ]
}

To manage a polymorphic dictionary of such definitions, a type with additionalProperties was defined. This definitions type is designed to accept additional properties that match either the base definition or any of the protocol-specific definitions.

"definitions": {
  "type": "object",
  "title": "definitions",
  "description": "A collection of Message Definitions.",
  "additionalProperties": {
    "oneOf": [
      {"$ref": "#/definitions/definition"},
      {"$ref": "...amqpDefinition"},
      {"$ref": "...mqttDefinition"},
      ...
    ]
  }
}

But this setup led to a paradox. If an instance matched one of the specific definitions, due to inheritance, it would also match the base definition. This violates the oneOf constraint, which states that exactly one of the schemas should match. Further, the base definition had to stay in the dictionary to cater to the requirements of certain code generation tools. Tools like NSwag pick the first entry of a oneOf list to determine the element type of a collection.

The Paradox

The paradox is that any instance conforming to one of the specific definitions also conforms to the base definition, because the base definition is a superset of the specific definitions. This leads to a violation of the oneOf constraint, creating a conflict between the needs of data validation and code generation.

The Solution

To resolve this paradox, we had to restructure the JSON Schema to ensure an instance cannot match both the base definition and any of the concrete definitions simultaneously. We created a new concreteDefinitions dictionary that houses all the specific definitions but not the base definition. Then, we revised the definitions object to implement a two-level check. For its additionalProperties, it first checks whether the object matches either the base definition or any of the concreteDefinitions. If that fails, it checks again if it matches any concreteDefinitions. This ensures that the base definition is matched only if none of the concreteDefinitions matches.

"concreteDefinitions": {
  "oneOf": [
    {"$ref": "...amqpDefinition"},
    {"$ref": "...mqttDefinition"},
    ...
  ]
},
"definitions": {
  "type": "object",
  "additionalProperties": {
    "oneOf": [
      {
        "oneOf": [
          {"$ref": "#/definitions/definition"},
          {"$ref": "#/definitions/concreteDefinitions"}
        ]
      },
      {"$ref": "#/definitions/concreteDefinitions"}
    ]
  }
}

Conclusion

This case study underlines the flexibility and versatility of JSON Schema, while simultaneously highlighting the complexities that can emerge when a single schema is expected to serve multiple purposes. Although JSON Schema is a potent tool in defining and validating data structures, applying it in complex scenarios can be challenging. Hence, careful and strategic structuring of the schema is paramount to ensure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment