Skip to content

Instantly share code, notes, and snippets.

@jayrbolton
Created July 23, 2020 22:14
Show Gist options
  • Save jayrbolton/117ca1d5bb9454c55a2295dda0df4d31 to your computer and use it in GitHub Desktop.
Save jayrbolton/117ca1d5bb9454c55a2295dda0df4d31 to your computer and use it in GitHub Desktop.
More reusable json-schemas

Goals:

  1. Allow local schemas to include other local schema files without hardcoding any absolute file paths
  2. Allow local schemas to include remote schemas, where the base URL of the remote schema can be configured (is not hard-coded)
  3. Serve schema libraries that can be used remotely over the http(s) protocol with consistent internal $id and $ref fields where the base URL can be configured

1. Including local schemas

We use a template variable called {base} to insert the project's absolute file path without hard-coding it:

{
    "$id": "{base}/schema.json",
    "$schema": "http://json-schema.org/draft-07/schema#"
}

We can refer to other schemas in the file path using paths from {base}:

{
    "$id": "{base}/schema.json",
    "$schema": "http://json-schema.org/draft-07/schema#",
    "type": "object",
    "properties": {
        "amount": {"$ref": "{base}/lib/defs1.json#/definitions/currency"}
    }
}

Then, to use these schemas, initialize a Validator with a base path:

dirpath = str(pathlib.Path().absolute())
validator = Validator("schema.json", dirpath)

2. Including remote schemas

Use custom template variable names to reference additional configurable base URIs in your schema, such as an http server (see section 3). In the below example, we add a reference for "product_id" that has base URI alias of {remote1}.

{
    "$id": "{base}/schema.json",
    "$schema": "http://json-schema.org/draft-07/schema#",
    "type": "object",
    "properties": {
        "amount": {"$ref": "{base}/lib/defs1.json#/definitions/currency"},
        "product_id": {"$ref": "{remote1}/defs.json#/definitions/product_id"}
    }
}

When you initialize the Validator class, you can configure the location of {remote1}:

includes = {'remote1': 'http://localhost:5000'}
validator = Validator("schema.json", dirpath, includes)

3. Serve schema libraries

When you write a set of schemas that can be reused elsewhere, then you can serve them over HTTP using the SchemaServer.

The schemas in your library should all use the {base} variable in their $ids and $refs:

{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "$id": "{base}/defs.json",
    "definitions": {
        "product_id": {
            "type": "string",
            "format": "\\d+\\.\\d+"
        }
    }
}

Start the server:

SchemaServer(
  schemas_path='./schema-lib/',
  base_uri='http://localhost:5000',
  schema_includes={},
  sanic_config={'port': 5000},
  app_name='myschemas'
)

Your schema will be served over HTTP with the correct $id and $ref fields populated based on the base_uri option:

{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "$id": "http://localhost:5000/defs.json",
    "definitions": {
        "product_id": {
            "type": "string",
            "format": "\\d+\\.\\d+"
        }
    }
}

Schemas will also be served using the same path hierarchy found in the schemas_path directory.

Source Code

Working prototype code

Validator class:

class Validator:
    schema: dict

    def __init__(self,
                 schema: Union[str, dict],
                 base: str,
                 includes: dict):
        if isinstance(schema, str):
            with open(schema) as fd:
                schema = json.load(fd)
        if not base.startswith("file://"):
            base = "file://" + base
        self.schema = dict(schema)  # copy
        interps = {
            'base': base
        }
        for alias, uri in includes.items():
            interps[alias] = uri
        self.schema = _format_schema(schema, interps)

    def validate(self, example):
        jsonschema.validate(example, self.schema)

The library server:

class SchemaServer:

    def __init__(self,
                 schemas_path: str,
                 base_uri: str,
                 schema_includes: dict,
                 sanic_config: dict,
                 app_name: str = "schemas"):
        # Mapping of path name to schema
        schemas = {}
        paths = os.path.join(schemas_path, "**", "*.json")
        for path in glob.iglob(paths, recursive=True):
            with open(path) as fd:
                schema = json.load(fd)
            includes = {**schema_includes, 'base': base_uri}
            subpath = os.path.relpath(path, schemas_path)
            schemas[subpath] = _format_schema(schema, includes)

        async def root(request, path: str = ''):
            if path not in schemas:
                return sanic.response.raw(b'', status=404)
            return sanic.response.json(schemas[path])
        app = sanic.Sanic(app_name)
        app.add_route(root, "/")
        app.add_route(root, "/<path:path>")
        app.run(**sanic_config)

Helpers for traversing over nested objects and interpolating variables in $id and $ref fields in the schema:

def _iter_nested(obj):
    """Iterate over all nested key, val pairs in a dictionary"""
    for key, val in obj.items():
        yield key, val, obj
        if isinstance(val, dict):
            for each in _iter_nested(val):
                yield each


def _format_schema(schema, data):
    schema = dict(schema)
    for key, val, nested in _iter_nested(schema):
        if key == '$id' or key == '$ref':
            try:
                nested[key] = val.format(**data)
            except KeyError:
                continue
    return schema
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment