Skip to content

Instantly share code, notes, and snippets.

What would you like to do?



The document is aimed at an engineer who is to implement the requirements it defines. The language and structure should be designed for that audience. The focus should be on minimal completeness and the removal of any ambiguity.

The document doesn't need to contain any justification of its design goals - it is assumed these are in line with both the business strategy and user requirements. It is relevant to discuss how the data design meets the user requirements.


Anything in the document that isn't necessary for its purpose should be excluded. Out of scope text is a hostage to fortune as requirements change, and it is a cognitive overhead for the document's consumer to bear.


The breadth of vocabulary used should be minimised; using different terms for the same concept adds to the cognitive overhead and could led to misinterpretation.

All non-obvious terms should be explicitly defined.

Expected sections

  1. Introduction and overview

  2. Summary of user requirements

    Those which have a bearing on the data model

  3. Definition of non-obvious terms

  4. Typical external schema

    What we expect to get from a typical source.

  5. Conceptual schema

    High-level description in words of what we want to store.

  6. Logical schema

    Field-level description of how we intend to store the data. Expected to be tabular.

    This is not the physical schema (i.e. complete table design with column names) but a field-by-field description using field names from the JSON schemata where appropriate.

  7. Transformation rules & intermediate schemata

    How we transform the external data, including any transformations needed to put it in intermediate storage prior to ingestion.

    This section should make reference to the defined JSON schema elements we intend to use.

    We should avoid prescriptive implementation detail.

  8. Exception handling

    The logical steps to take if the incoming information is incomplete, corrupt or out of range.

  9. Examples (as logical ERDs)

    Examples of source documents and how the data inferred from them would be stored logically, including the relationships between stored objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.