Goals:
- Allow local schemas to include other local schema files without hardcoding any absolute file paths
- Allow local schemas to include remote schemas, where the base URL of the remote schema can be configured (is not hard-coded)
- Serve schema libraries that can be used remotely over the http(s) protocol with consistent internal
$id
and$ref
fields where the base URL can be configured
We use a template variable called {base}
to insert the project's absolute file path without hard-coding it:
{
"$id": "{base}/schema.json",
"$schema": "http://json-schema.org/draft-07/schema#"
}
We can refer to other schemas in the file path using paths from {base}
:
{
"$id": "{base}/schema.json",
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"amount": {"$ref": "{base}/lib/defs1.json#/definitions/currency"}
}
}
Then, to use these schemas, initialize a Validator with a base path:
dirpath = str(pathlib.Path().absolute())
validator = Validator("schema.json", dirpath)
Use custom template variable names to reference additional configurable base URIs in your schema, such as an http server (see section 3). In the below example, we add a reference for "product_id"
that has base URI alias of {remote1}
.
{
"$id": "{base}/schema.json",
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"amount": {"$ref": "{base}/lib/defs1.json#/definitions/currency"},
"product_id": {"$ref": "{remote1}/defs.json#/definitions/product_id"}
}
}
When you initialize the Validator class, you can configure the location of {remote1}
:
includes = {'remote1': 'http://localhost:5000'}
validator = Validator("schema.json", dirpath, includes)
When you write a set of schemas that can be reused elsewhere, then you can serve them over HTTP using the SchemaServer.
The schemas in your library should all use the {base}
variable in their $id
s and $ref
s:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "{base}/defs.json",
"definitions": {
"product_id": {
"type": "string",
"format": "\\d+\\.\\d+"
}
}
}
Start the server:
SchemaServer(
schemas_path='./schema-lib/',
base_uri='http://localhost:5000',
schema_includes={},
sanic_config={'port': 5000},
app_name='myschemas'
)
Your schema will be served over HTTP with the correct $id
and $ref
fields
populated based on the base_uri
option:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "http://localhost:5000/defs.json",
"definitions": {
"product_id": {
"type": "string",
"format": "\\d+\\.\\d+"
}
}
}
Schemas will also be served using the same path hierarchy found in the schemas_path
directory.
Working prototype code
Validator class:
class Validator:
schema: dict
def __init__(self,
schema: Union[str, dict],
base: str,
includes: dict):
if isinstance(schema, str):
with open(schema) as fd:
schema = json.load(fd)
if not base.startswith("file://"):
base = "file://" + base
self.schema = dict(schema) # copy
interps = {
'base': base
}
for alias, uri in includes.items():
interps[alias] = uri
self.schema = _format_schema(schema, interps)
def validate(self, example):
jsonschema.validate(example, self.schema)
The library server:
class SchemaServer:
def __init__(self,
schemas_path: str,
base_uri: str,
schema_includes: dict,
sanic_config: dict,
app_name: str = "schemas"):
# Mapping of path name to schema
schemas = {}
paths = os.path.join(schemas_path, "**", "*.json")
for path in glob.iglob(paths, recursive=True):
with open(path) as fd:
schema = json.load(fd)
includes = {**schema_includes, 'base': base_uri}
subpath = os.path.relpath(path, schemas_path)
schemas[subpath] = _format_schema(schema, includes)
async def root(request, path: str = ''):
if path not in schemas:
return sanic.response.raw(b'', status=404)
return sanic.response.json(schemas[path])
app = sanic.Sanic(app_name)
app.add_route(root, "/")
app.add_route(root, "/<path:path>")
app.run(**sanic_config)
Helpers for traversing over nested objects and interpolating variables in $id
and $ref
fields in the schema:
def _iter_nested(obj):
"""Iterate over all nested key, val pairs in a dictionary"""
for key, val in obj.items():
yield key, val, obj
if isinstance(val, dict):
for each in _iter_nested(val):
yield each
def _format_schema(schema, data):
schema = dict(schema)
for key, val, nested in _iter_nested(schema):
if key == '$id' or key == '$ref':
try:
nested[key] = val.format(**data)
except KeyError:
continue
return schema