Skip to content

Instantly share code, notes, and snippets.

@seagreen
Last active January 13, 2017 03:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save seagreen/046d792d932a316f0600d5ed429bb593 to your computer and use it in GitHub Desktop.
Save seagreen/046d792d932a316f0600d5ed429bb593 to your computer and use it in GitHub Desktop.
Designing Quality JSON Schema Errors

Intro

The JSON Schema spec doesn't speak on error construction at all (which is absolutely the right decision).

This means that each library author has to figure out how to construct informative errors individually.

This document is meant to sketch out some best practices.

Example

Schema

{
  "properties": {
    "foo": {
      "items": {
        "type": "string"
      }
    }
  }
}

Data

{
  "foo": [ null ]
}

Some Options

(Examples in Haskell)

1. The simplest possible

exampleError :: Bool
exampleError = False

This makes for totally unhelpful error messages.

2. Only reporting the bare minimum needed to derive anything else

Note: I think it's important that the errors returned by individual validators don't know about the schema-level errors they will eventually be used to produce. That way they can be used in other schemas in the future without causing problems. In this case we achieve that by parameterizing PropertiesValFailure and ItemsValFailure with the schema level error type (which in this case ends up being ValidationFailure)

-- | Declare our the error type for the "properties" validator.
--
-- It's a hashmap of properties keys to the errors that resulted from them.
--
-- It takes a type argument (`err`) because we want to be agnostic about
-- what errors the schema it's eventually used in can produce. That way
-- it can be used by later JSON Schema specifications as well as the current one.
data PropertiesValFailure err = PropertiesValFailure (HashMap Text err)

-- | Declare the error type for the "items" validator.
--
-- It's a hashmap of indexes to the errors that resulted from the data
-- at that index.
data ItemsValFailure err = ItemsValFailure (HashMap Int err)

-- | Declare the error type for the whole schema.
data ValidationFailure
    = InvalidProperties (PropertiesValFailure Failure)
    | InvalidItems (ItemsValFailure Failure)
    -- ^ In a real JSON Schema lib we'd have a lot more errors than this
    -- to handle (allOf, anyOf, etc.) but this is just an example.
    | LeafFailure
    -- ^ LeafFailure is what we use for "type", as well as any other validators
    -- that themselves can't contain errors (like "maximum", "mininum", etc.)
    -- The value of validators like "type" can be derived from the starting schema
    -- and the rest of the error message (e.g.
    -- `InvalidProperties (PropertiesValFailure (HashMap.singleton "quux" LeafFailure))`
    -- would mean that the validator that cause the error is value of the
    -- schema object at the "properties/quux" key). 

-- | Validation produces a list of `ValidationFailure`s, up to one for each top-level
-- validator.
exampleError :: [ValidationFailure]
exampleError =
    [ InvalidProperties (PropertiesValFailure (
          HashMap.singleton "foo" (
              InvalidItems (ItemsValFailure (HashMap.singleton 0 LeafFailure)
              )
          )
      )
    ]

This provides all the information we need to reconstruct what happended, which is good! But the messages are still hard to read at a glance. What other info should probably be included?

@handrews
Copy link

handrews commented Jan 4, 2017

Nice! This is a great topic for discussion. Not knowing Haskell I'm a little unsure if I'm reading things correctly. Could you include the actual text output that the code is intended to produce?

@seagreen
Copy link
Author

seagreen commented Jan 4, 2017

@handrews:

(First sidenote: I went through the code, fixed errors, changed names, and commented everything. It should be more self-evident now)

Could you include the actual text output that the code is intended to produce?

This actually gets at one of my questions -- I'm wondering if it would be better to produce human readable messages along with the original errors, or to produce them later. Regardless in this case we aren't including them originally, so we would have to produce them later with a function of the type ValidationFailure -> String.

In this case the most naive version of this function would produce:

Validation failed:
  In the validator "properties" at the "foo" key:
    In the validator "items" at the 0 index:
      An error was found.

A more sophisticated version could look at "foo/0" in the schema and data and replace An error was found with type: "null" doesn't match type: "string".

I was going to have another question here -- should the main validation functions distinguish the leaf validators from each other (eg should they have TypeValFailure, MaximumValFailure, etc. or not) but I'm becoming sure that they should and having one overall LeafFailure isn't the right answer. While it's true the former includes info that could be derived from the rest of the error, the process of deriving that info is an unnecessary additional complication to making nice error messages.

@seagreen
Copy link
Author

seagreen commented Jan 4, 2017

Another issue, is "invalid", "failure", or "error" a better name for the errors validators produce? What about the whole schema?

@seagreen
Copy link
Author

Docs on how the Python jsonschema library handles errors are here: http://python-jsonschema.readthedocs.io/en/latest/errors/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment