The document is aimed at an engineer who is to implement the requirements it defines. The language and structure should be designed for that audience. The focus should be on minimal completeness and the removal of any ambiguity.
The document doesn't need to contain any justification of its design goals - it is assumed these are in line with both the business strategy and user requirements. It is relevant to discuss how the data design meets the user requirements.
Anything in the document that isn't necessary for its purpose should be excluded. Out of scope text is a hostage to fortune as requirements change, and it is a cognitive overhead for the document's consumer to bear.
The breadth of vocabulary used should be minimised; using different terms for the same concept adds to the cognitive overhead and could led to misinterpretation.
All non-obvious terms should be explicitly defined.
Introduction and overview
Summary of user requirements
Those which have a bearing on the data model
Definition of non-obvious terms
Typical external schema
What we expect to get from a typical source.
High-level description in words of what we want to store.
Field-level description of how we intend to store the data. Expected to be tabular.
This is not the physical schema (i.e. complete table design with column names) but a field-by-field description using field names from the JSON schemata where appropriate.
Transformation rules & intermediate schemata
How we transform the external data, including any transformations needed to put it in intermediate storage prior to ingestion.
This section should make reference to the defined JSON schema elements we intend to use.
We should avoid prescriptive implementation detail.
The logical steps to take if the incoming information is incomplete, corrupt or out of range.
Examples (as logical ERDs)
Examples of source documents and how the data inferred from them would be stored logically, including the relationships between stored objects.