ttristan/RFC: JSON trivago Schema definition for cross-schema validation.md

## RFC: JSON trivago Schema definition for cross-schema validation.md

      
    Raw
  

              RFC: JSON trivago Schema definition for cross-schema validation.md
            
          
    RFC: JSON Schema definition for cross-schema validation


proposed names: "darkwing duck", "JSON trivago Schema", "JSON Super Schema”, “JSON Meta Schema”, "JSON Wrapped Schema", “JSON Relational Schema”, “JSON Linked Schema"

Problem description

We need to validate api responses of our requests to remote partner services using JSON Schema validation with Ajv. For this, we need to reference data outside of the standard JSON Schema. For example: If we have the property id: 5 in the request body, we expect the response also to contain id: 5. While this data is static for this particular request, it is also dynamic across mutliple requests. The question now is, how could this be encoded inside of a JSON Schema file?
In detail: We define the API specification and standard on how partner REST services should be exposed. We will then use their service to request their data to provide it on our platform. For this, we design the respective JSON Schema files to use them for a validation and verification process. Withing this, we expect all of the data provided by these service to be consistent.  However, some of the data provided by the service is in relationship with previous requests/responses and we want to encode such relationships within our Schema files. With the current JSON Schema spec (draft-07) this is not possible. All we can do is to make sure that id exists with the correct type but not the actual data.
Approach


validate data relations with a separate (wrapping) schema definition file
leave JSON Schema itself untouched, wrap another layer around it so that data can be validated across Schemas
use the new wrapping schema validation logic in the sense of request.start_date == availability.start_date == booking.start_date

Example workflow


availability request -> now the state knows about the availability request and response schema
booking request -> now the state knows about booking request and response schema
hypothetical schema with rules -> now that the state knows about all requests and we can check for possible relations
run all schema definitions through Ajv and create the final JSON Schema with all data
generate validation output and provide to UI

Why we use Ajv


Ajv provides all the functionality we need to provide great validation within the NodeJs environment
Ajv is greatly adopted and has large community support
Ajv is easy to extend with plugins
Provides native $data reference withing one JSON Schema definition file. More info
Validation results are consistent, very clear and easy to use

Discussed and discarded ideas

Why we will not use Hyper Schema


Hyper-schema is only concerned with one resource and set of associated links at a time. Just as a web browser works with only one HTML page at a time, with no concept of whether or how that page functions as part of a "site", a hyper-schema-aware user agent works with one resource at a time, without any concept of whether or how that resource fits into an API. Therefore, hyper-schema is suitable for use within an API, but is not suitable for the description of APIs as complete entities in their own right. There is no way to describe concepts at the API scope, rather than the resource and link scope, and such descriptions are outside of the boundaries of JSON Hyper-Schema.]


A hyper-schema implementation is not itself expected to construct and send requests.


even if it is possible to provide resource outside of the current Schema scope via Hyper Schema links, those resource again would have to be provided in a JSON schema form and therefore be converted beforehand or at runtime
there is no actual link to data or information of connected schemas.
JSON Hyper Schema is only concerned with defining links, not more. It is like clicking a (hyper) link on a website: The  website you get forwarded to has no knowledge of the one it came from.

Why we will not use $ref


suggested in https://github.com/json-schema-org/json-schema-spec/issues/549#issuecomment-370279299
$ref abstracts out the sub schema and is mainly used to keep a schema DRY and reference a sub schema multiple times, we do not have this use case
refs point to other JSON Schema files with relative paths. For this to work, we would have to transform a previous request/response into a static JSON Schema and then reference it in the next Schema dynamically.
Will probably cause confusion and cause more problems than it solved
same reasons why we not use custom keywords

Why we will not use custom keywords


We want to stick with the standard as much as possible and not reinvent the wheel while also polluting the standard solution (here JSON Schema) with custom functionality that might cause confusion