Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save ttristan/34a27f0271064cb92df37a15ba72e952 to your computer and use it in GitHub Desktop.
Save ttristan/34a27f0271064cb92df37a15ba72e952 to your computer and use it in GitHub Desktop.

RFC: JSON Schema definition for cross-schema validation

proposed names: "darkwing duck", "JSON trivago Schema", "JSON Super Schema”, “JSON Meta Schema”, "JSON Wrapped Schema", “JSON Relational Schema”, “JSON Linked Schema"

Problem description

We need to validate api responses of our requests to remote partner services using JSON Schema validation with Ajv. For this, we need to reference data outside of the standard JSON Schema. For example: If we have the property id: 5 in the request body, we expect the response also to contain id: 5. While this data is static for this particular request, it is also dynamic across mutliple requests. The question now is, how could this be encoded inside of a JSON Schema file?

In detail: We define the API specification and standard on how partner REST services should be exposed. We will then use their service to request their data to provide it on our platform. For this, we design the respective JSON Schema files to use them for a validation and verification process. Withing this, we expect all of the data provided by these service to be consistent. However, some of the data provided by the service is in relationship with previous requests/responses and we want to encode such relationships within our Schema files. With the current JSON Schema spec (draft-07) this is not possible. All we can do is to make sure that id exists with the correct type but not the actual data.

Approach

  • validate data relations with a separate (wrapping) schema definition file
  • leave JSON Schema itself untouched, wrap another layer around it so that data can be validated across Schemas
  • use the new wrapping schema validation logic in the sense of request.start_date == availability.start_date == booking.start_date

Example workflow

  1. availability request -> now the state knows about the availability request and response schema
  2. booking request -> now the state knows about booking request and response schema
  3. hypothetical schema with rules -> now that the state knows about all requests and we can check for possible relations
  4. run all schema definitions through Ajv and create the final JSON Schema with all data
  5. generate validation output and provide to UI

Why we use Ajv

  • Ajv provides all the functionality we need to provide great validation within the NodeJs environment
  • Ajv is greatly adopted and has large community support
  • Ajv is easy to extend with plugins
  • Provides native $data reference withing one JSON Schema definition file. More info
  • Validation results are consistent, very clear and easy to use

Discussed and discarded ideas

Why we will not use Hyper Schema

Hyper-schema is only concerned with one resource and set of associated links at a time. Just as a web browser works with only one HTML page at a time, with no concept of whether or how that page functions as part of a "site", a hyper-schema-aware user agent works with one resource at a time, without any concept of whether or how that resource fits into an API. Therefore, hyper-schema is suitable for use within an API, but is not suitable for the description of APIs as complete entities in their own right. There is no way to describe concepts at the API scope, rather than the resource and link scope, and such descriptions are outside of the boundaries of JSON Hyper-Schema.]

A hyper-schema implementation is not itself expected to construct and send requests.

  • even if it is possible to provide resource outside of the current Schema scope via Hyper Schema links, those resource again would have to be provided in a JSON schema form and therefore be converted beforehand or at runtime
  • there is no actual link to data or information of connected schemas.
  • JSON Hyper Schema is only concerned with defining links, not more. It is like clicking a (hyper) link on a website: The website you get forwarded to has no knowledge of the one it came from.

Why we will not use $ref

  • suggested in https://github.com/json-schema-org/json-schema-spec/issues/549#issuecomment-370279299
  • $ref abstracts out the sub schema and is mainly used to keep a schema DRY and reference a sub schema multiple times, we do not have this use case
  • refs point to other JSON Schema files with relative paths. For this to work, we would have to transform a previous request/response into a static JSON Schema and then reference it in the next Schema dynamically.
  • Will probably cause confusion and cause more problems than it solved
  • same reasons why we not use custom keywords

Why we will not use custom keywords

  • We want to stick with the standard as much as possible and not reinvent the wheel while also polluting the standard solution (here JSON Schema) with custom functionality that might cause confusion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment