pfrazee/schemas-design-doc-draft.md

## schemas-design-doc-draft.md

      
    Raw
  

              schemas-design-doc-draft.md
            
          
    Schemas Design Doc (draft)

Please note: The following document is an initial draft proposal. All decisions are subject to change. Our present goal is to collect feedback and iterate upon this document. Please feel free to share your suggestions and concerns.
Overview

ADX is a federated network for distributing data. It leverages cryptographic signatures and hashes to distribute authenticity proofs along with the data, enabling each node to transact upon the data independently of the data's origin. ADX might therefore be described as an Internet-native database in which records are replicated across nodes.
As a consequence of relying on authenticity proofs, ADX must exchange "canonical" records. That is, ADX records must be transmitted in their original encoding and structure in order to validate the signatures and hashes which comprise the proofs. This stands in contrast to the RESTful model of the Web in which "representations" of records are exchanged and therefore may be constructed at the time of exchange. While ADX records may be stored and queried in a variety of forms, they must be transmitted in their canonical form.
The canonical form implies an encoding, layout, and underlying data model which is shared by all ADX nodes. Again, this stands in contrast to Internet applications which interoperate through messaging. ADX is a record-oriented network and must provide a sufficiently-general data model for a wide variety of applications. Developers will likely view ADX as a database within their software stack, and while records can be copied into other databases and systems, any records to be transmitted must be written in the canonical form to the ADX systems.
In addition to the data model of the canonical form, ADX applications must also agree upon the semantics of the exchanged records. This is generally referred to as the "schemas." In ADX, schemas inform the data model and visa-versa, therefore this document encompasses encoding, data model, and semantics.
Since the start of Bluesky, schemas have been highlighted by engineers both inside and outside the team as a lynch-pin to the project's success. This interest reflects many factors: the impact of schemas on developer experience, their relevance toward the evolvability of the network, and the high amount of opinion among SMEs in the space. Schemas are one of the most hotly debated topics among the community, and a good solution will consider as many of the known solutions as possible.
Three dominant philosophies have emerged in decentralized networks: global term definitions via RDF, convention-oriented freeform objects, and networked programs such as Ethereum's smart contracts. It's worth giving each a brief overview and discussing their strengths and weaknesses:

  
   Global terms (RDF)
   
   RDF is a highly-general model for creating unambiguous semantics. It uses a directed graph to organize all information into "triples" of facts. Many developers are only aware of RDF via JSON-LD, a format which provides an object-document abstraction over RDF while preserving the graph model.

RDF's strengths are its rigour, its standards-driven governance, its wide adoption in the Fediverse, and its flexibility. Its weaknesses are its complexity, poor DX, and unfamiliarity outside of certain developer niches.

JSON-LD has demonstrated that well-designed tooling can overcome the weaknesses of RDF when consuming schemas, but authoring new schemas (vocabularies) remains a daunting task.
   
  
   Freeform objects
   
   Convention-driven systems have a rich history in the indie-hacker culture and are often proposed as a solution to decentralized networks. These models typically use an object-document model and leave developers free to populate the document however they see fit, often with a few baked-in conventions such as indexing upon a "type" attribute.

The strengths of freeform objects include evolvability, flexibility, and ease-of-understanding. The weaknesses include the slow development of conventions, lack of coordination between separate teams/orgs, and frequent incompabilities between applications.

Freeform objects often rely upon application libraries and can enable bazaar-style innovation. However, the innovation process can often be frustrating for end-users and developers as incompatibilities surface frequently and can be slow to resolve.
   
  
   Networked programs
   
   Blockchain-based systems like Ethereum have recently advanced the use of a shared runtime which abstracts the network. Programs on the runtime (smart contracts) encapsulate state with a set of APIs which enforce schemas and business logic.

Networked programs benefit from their intuitive nature: developers can think of them like regular programs, or perhaps like Web APIs. The bytecode is publicly available and can be connected to the source-code to clearly explain their behavior. However the current models suffer from a great deal of runtime overhead, the gas-fee incentive to pre-optimize, and the existence of subtle complexities which lead to bugs.

Bluesky is not using a blockchain, however there are interesting lessons to be learned from networked programs. Declarative, machine-readable definitions could be distributed over the network to instruct general-purpose nodes to enforce useful behaviors.
   
  
This proposal builds upon RDF’s global terms with tooling inspired by the free form objects philosophy. The intent is to provide optional-but-recommended mechanics which assist developers without overly constraining them.
A slightly richer set of "value types" are used in the encoding of ADX records than is common for eg JSON. This enables ADX records to self-describe with some higher-level semantics, facilitating schema-free operation.
Schemas provide additional semantics, descriptions, constraints, and properties to ADX's value types. They assist in the interpretation and consistent usages of ADX records. Schemas may be published to the ADX network in a machine-readable form, enabling convenient distribution and access by software. However, most usages of schemas are optional, ensuring that the network remains flexible. Any software which depends on network-accessible schemas must additionally provide a fallback behavior if a schema is not available.
The semantics of schemas are based on RDF. This serves multiple goals: to benefit from the rigor of RDF, to leverage existing RDF software and techniques, and to ensure interoperability with systems outside of ADX. Many of the systems in this document can be described as a DSL over RDF.
Objectives


Schema evolvability. The schema system must enable developers to extend, evolve, and repurpose the network and its data.

Ideally this should occur with minimal upfront social consensus – developers should not need to convince a "spec owner" to modify their schema in order to make changes.


Developer convenience. Schemas and their tooling should be obvious, easy to use, and empowering.

Tools should not overburden developers with strictness or busy-work. When there are guard-rails, those rails should inform the developer, not control them.
We should recognize that the deployment of new schemas is a common task and cannot depend on slow-moving standards bodies.


Reduced incompatibilities. Incompatibilities are a coordination challenge which emerge during independent development. Often they result in an inconsistent user experience: markup showing up in text, features missing from content (eg absent embedded media), or unexpected behaviors between applications. These issues affect users and place a burden upon developers to introduce features without creating compatibility issues. Tooling which helps with coordination between teams can improve evolvability as developers can more easily predict the effects of their software.
Avoid NIH. There is a large body of prior work available for schemas. When possible, this system should leverage existing technology in order to benefit from its software, corpus of specifications, and expertise.

When developing a novel solution, the reasoning for divergence should be clear and justified.


Hash-friendly encodings. In order to validate authenticity proofs, ADX records must reliably serialize to a canonical form.

While JSON has wide adoption, it fails to provide a canonical encoding without additional rules such as sorted keys, discarded duplicate keys, and string-encoded decimal numbers. This requirement demands either a modified JSON encoding or some other encoding format.


Unambiguous terms. Applications should agree upon which values are being shared.

Ambiguous terms – eg keynames in documents which are not well-defined – risk creating collisions which are difficult to resolve (and often difficult to detect and debug).


User-friendly descriptions. ADX software must provide UIs such as permission prompts which describe the data being affected.
Secure trust model. Any schema information must have a clear and secure trust model.

This is trivial for most applications as they choose which schemas to integrate and then act upon the records according to their own validation. However, in some situations the schemas are chosen by third-parties such as when providing permission prompts, enabling an application to misrepresent actions to the user or to the system. All usages of schemas must consider the effects of malicious actors.


Core concepts

Data encoding (CBOR)

ADX records are encoded using CBOR.
Value types

"Value types" establish the kinds of values in the data model.
The data model supports a subset of CBOR's available value types:

  
   null
   
   A CBOR simple value (major type 7, subtype 24) with a simple value of 22 (null). 
   
  
   boolean
   
   A CBOR simple value (major type 7, subtype 24) with a simple value of 21 (true) or 20 (false). 
   
  
   integer
   
   A CBOR integer (major type 0 or 1), choosing the shortest byte representation. 
   
  
   float
   
   A CBOR floating-point number (major type 7). All floating point values MUST be encoded as 64-bits (additional type value 27), even for integral values.
   
  
   string
   
   A CBOR string (major type 3).
   
  
   list
   
   A CBOR array (major type 4), where each element of the list is added, in order, as a value of the array according to its type.
   
  
   map
   
   A CBOR map (major type 5), where each entry is represented as a member of the CBOR map. The entry key is expressed as a CBOR string (major type 3) as the key.
   
  
   datetime
   
   A CBOR datetime (major type 6, tag 0), an ISO-8601-formatted date-time string.
   
  
   uri
   
   A CBOR uri (major type 6, tag 32), an RFC-6986-formatted uri string.
   
  
TODO: do we need uint53, int54?
TODO: do we need bignums?
TODO: do we need binary? I'm inclined to say no: we face problems when people stuff binary into records rather than using blobs
Data types

"Data types" establish the kinds of properties in the data model. They are used in schemas to define how properties should be interpreted.
Every data type defines an interpretation for each value type. In cases where no useful interpretation can be created, the interpretation is mapped to Null. For example, a DateTime property can be validly be set to the String and Null values; for all other values, the interpretation resorts to Null.
All data types (builtin and user-defined) follow the RDF model. Consequently, the type-identifiers for the builtin simple types and user-defined record types expand to URIs. They are mapped to shortened terms for use in records.
Builtin data types

The data model supports a set of simple data types:

  
   Data type
   
   Primary value type
   
   RDF term
   
  
   any
   
   –
   
   –
   
  
   boolean
   
   boolean
   
   xsd:boolean
   
  
   integer
   
   integer
   
   xsd:integer
   
  
   float
   
   float
   
   xsd:double
   
  
   string
   
   string
   
   xsd:string
   
  
   datetime
   
   datetime
   
   xsd:dateTime
   
  
   date
   
   datetime
   
   xsd:date
   
  
   time
   
   datetime
   
   xsd:time
   
  
   uri
   
   uri
   
   xsd:anyURI
   
  
The data model also supports a set of complex types:

  
   Data type
   
   Primary value type
   
   RDF term
   
   Description
   
  
   record
   
   map
   
   rdfs:Resource
   
   A key/value document.
   
  
   list
   
   list
   
   rdf:Seq
   
   An ordered array.
   
  
The complex types can contain all other simple or complex types. As explained in "Data layout," all information is published in records, and records can contain records.
User-defined record types

Records may be assigned custom types. New record types are created by publishing a schema.
A record type is any valid URI. Tools may attempt to download a machine-readable schema from the URI, but this is not required.
Standard record fields

Records contain the following standard fields:

  
   Field
   
   Type
   
   Description
   
  
   type
   
   string
   
   Declares the type of a record. Must be a valid Schema ID or URI.
   
  
Data layout

The data layout establishes the units of network-transmissable data. It includes the following three groupings:

Repository. The dataset of a single actor; contains a set of collections.
Collection. An ordered list of records.
Record. A key/value document.

These groupings establish addressability as well as the available network queries. For instance, a Repository is addressed by its DID and can be fetched in its entirety, while a Collection is addressed by a DID + its ID and can be fetched partially with range queries. It is not possible to transmit smaller units of data than these three groupings; for instance, a subset of a record cannot be requested over the network.
Additional properties and behaviors for each grouping are defined below.
Repository

Repositories are the dataset of a single "actor" (ie user) in the ADX network. Every user has a single repository which is identified by a DID.
Collection

A collection is an ordered list of records. Every collection has a type and is identified by the Schema ID of its type. Collections may contain records of any type and cannot enforce any constraints on them.
Record

A record is a key/value document. It is the smallest unit of data which can be transmitted over the network. Every record has a type and is identified by a key which is chosen by the writing software.
Builtin collections

The builtin "Definitions collection," identified by adxs.org:Definitions, is used to store schema definitions.
Schemas

Schemas are documents which declare new types. They define:

Semantic meanings,
Descriptive metadata,
Shape-constraints, and
Behavior hints.

The primary purpose of schemas is to help developers reach consensus on how they interact on the system. Their secondary purpose is to provide tooling which reduces bugs and incompatibilities, however most tooling is chosen by applications and is therefore optional.
Schema IDs

All schemas are published as records in the builtin adxs.org:Definitions collection. This makes it possible to reference schemas using only the repository name and schema keyname. We call this the "Schema ID".
schema-id   = repo-name ":" schema-name
repo-name   = [ reg-name "@" ] reg-name
schema-name = reg-name

reg-name is defined in https://www.rfc-editor.org/rfc/rfc3986#section-3.2.2.
For example, the schema ID of example.com:Song can be found in the adx:example.com repository in the def collection under the song key.
"adx" URL scheme

The adx URL scheme is used to address records in the ADX network.
adx-url   = "adx://" authority path [ "?" query ] [ "#" fragment ]
authority = repo-name / did
repo-name = [ reg-name "@" ] reg-name
path      = [ "/" schema-id [ "/" record-id ] ]
coll-ns   = reg-name
coll-id   = 1*pchar
record-id = 1*pchar

did is defined in https://w3c.github.io/did-core/#did-syntax.
reg-name is defined in https://www.rfc-editor.org/rfc/rfc3986#section-3.2.2.
pchar is defined in https://www.rfc-editor.org/rfc/rfc3986#section-3.3.
query is defined in https://www.rfc-editor.org/rfc/rfc3986#section-3.4.
fragment is defined in https://www.rfc-editor.org/rfc/rfc3986#section-3.5.
schema-id is defined in "Schema IDs."
The fragment segment only has meaning if the URL references a record. Its value maps to a subrecord with the matching "id" value.
Some example adx URLs:

  
   Repository
   
   adx://bob.com
   
  
   Repository
   
   adx://bob@work.com
   
  
   Repository
   
   adx://did:ion:EiAnKD8-jfdd0MDcZUjAbRgaThBrMxPTFOxcnfJhI7Ukaw
   
  
   Collection
   
   adx://bob.com/example.com:songs
   
  
   Record
   
   adx://bob.com/example.com:songs/3yI5-c1z-cc2p-1a
   
  
ADX-Schema (ADXS)

ADX-Schema (or "ADXS") is a DSL for schemas in ADX.
It can be helpful to start from an example:
{
  "name": "Post",
  "extends": "record",
  "comment": "A little chirp.",
  "props": {
    "text": {
      "type": "string",
      "required": true,
      "maxLength": 255
    },
    "extendedText": "string",
    "postedFrom": "gis.org:Location",
    "mentions": "adx.net:User[]"
  }
}

The schema above defines a record-type named "Post." When published, its ID will combine the repo name with the schema name, eg example.com:Post.
The post record defines a set of properties, each with a type. Let's look at each in detail:

text Uses the builtin string type with a length constraint. It also declares that the field is required.
extendedText Uses the builtin string type with no extra constraints.
postedFrom Uses a record with a custom type which is imported from another schema.
mentions Uses a list of records with a custom type which is also imported from another schema.

Inheritance

Schemas are polymorphic, meaning they can extend existing schemas to add or redefine constraints. TODO: what are the rules for "overwriting" parent schema definitions?
Properties take advantage of polymorphism, meaning that properties will accept values of the given type or their child types.
Not all types are extensible. The types which may be extended are:

record A key/value document.
collection An ordered list of records.
view A network endpoint which provides views of the network data. To be described in a future document.
procedure A network endpoint which provides effectful operations. To be described in a future document.

Consequently, all schemas extend from these base types or a subtype of them.
ADXS Structure

The structure of an ADXS document depends on the base type. The attributes and their interpretation are described in the following sub-sections.
Schema attributes

Schema objects may contain the following fields:

  
   Field
   
   Type
   
   Description
   
   Applies to
   
  
   name
   
   string
   
   The name of the schema.
   
   any
   
  
   extends
   
   string
   
   The base type of the schema. May be "record", "collection", "view", "procedure", or the Schema ID of an existing schema.
   
   any
   
  
   comment
   
   string
   
   A description of the schema.
   
   any
   
  
   props
   
   Properties map
   
   A map of properties which can be included in the record or view and their definitions.
   
   record, view
   
  
Properties map

The properties map enumerates a list of properties and their definitions. It is used in record and view schemas.

  
   Keys
   
   string
   
   The "path" of the property.
   
  
   Values
   
   string|object
   
   A type string or a property definition object. See "Property attributes" for a description of property definition objects.
   
  
Type string format

Type strings follow the following format:
type    = ( type-id [ "[]" ] / URI )
type-id = "any" / "boolean" / "integer" / "float" / "string" / "datetime" / "date" / "time" / "uri" / "record" / schema-id / "null"

reg-name is defined in https://www.rfc-editor.org/rfc/rfc3986#section-3.2.2.
URI is defined in https://www.rfc-editor.org/rfc/rfc3986#section-3.
schema-id is defined in "Schema ID".
Type strings are interpreted with the following rules:

The type-id segment maps to a builtin datatype or a user-defined datatype.
The [] postfix indicates that the type is a list.
The URI indicates an RDF vocabulary definition.

Property attributes

Property objects may contain the following fields:

  
   Field
   
   Type
   
   Description
   
   Applies to
   
  
   type
   
   string|string[]
   
   The type or types of the property.
   
   any
   
  
   contains
   
   string
   
   The type of the contained property. Follows the rules of `type`.
   
   list
   
  
   required
   
   boolean
   
   Do records need to specify a value for this property?
   
   any
   
  
   minCount
   
   number
   
   The minimum number of values in the list.
   
   list
   
  
   maxCount
   
   number
   
   The maximum number of values in the list.
   
   list
   
  
   minLength
   
   number
   
   The minimum length of the string.
   
   string
   
  
   maxLength
   
   number
   
   The maximum length of the string.
   
   string
   
  
   mimeType
   
   string|string[]
   
   The supported MIME types of the value.
   
   string
   
  
   pattern
   
   string
   
   A regex defining valid values of the string.
   
   string
   
  
   oneOf
   
   any[]
   
   A list of valid values.
   
   integer, string, integer[], string[]
   
  
   minInclusive
   
   number|string
   
   The minimum value, inclusive.
   
   integer, float, date, time, datetime, duration
   
  
   minExclusive
   
   number|string
   
   The minimum value, exclusive.
   
   integer, float, date, time, datetime, duration
   
  
   maxInclusive
   
   number|string
   
   The maximum value, inclusive.
   
   integer, float, date, time, datetime, duration
   
  
   maxExclusive
   
   number|string
   
   The maximum value, exclusive.
   
   integer, float, date, time, datetime, duration
   
  
   defaultValue
   
   any
   
   A default value to assign the property if none is provided.
   
   any
   
  
   comment
   
   string
   
   A description of the property.
   
   any
   
  
TODO: need a way to express per-type constraints when multiple types are supported
TODO: hints about behaviors such as value indexing?
Processes

Schema publishing

Schemas are published in ADX repos in the builtin adxs.org:Definitions collection.
ADXS records

ADX-Schemas are published in a record of the adxs.org:Schema type. This requires some structure modification; for instance, the "Properties map" must be transformed into an array form as ADX records do not support "map" constructs.
Schema-ID assignment

The Schema-ID of a schema can be constructed once published according to the rules defined in "Schema IDs". For instance, a schema record published at adx://example.com/adxs.org:Definitions/Thing will have the Schema-ID of example.com:Thing.
Hosting

Schemas are expected to be kept available by their authors. If hosting is not maintained, developers and systems will be unable to access the schema definition and will fallback to schema-less behaviors. This will degrade the user experience and therefore schema-authors should be conscious of their obligation to continue hosting.
To improve the reliability of schema hosting, it's recommended to operate "schema management" services to which authors can submit their schemas. This will improve availability and can enable some additional validation to be enforced.
Schema consumption

Schemas are referenced by a Schema ID in ADX applications.
The builtin behaviors around schema consumption are kept minimal to ensure ADX is flexible and tolerant of unavailable schema definitions (see "Operation without schemas").
Applications can download schemas using tools similar to software package managers. These schemas can be stored in the app's software repository and leveraged by libraries to provide additional behaviors. Suggested behaviors include:

Write validation which errors when a record does not conform to the schema.
Read validation with configurable behaviors for non-conforming records (skip, warn, error, ignore).
Read coercion which interprets value types into the schema's asserted data types.

Permission interfaces and resource-descriptions

The ADX ecosystem includes a capabilities-model of permissioning built on UCANs. The UCANs require a string format for identifying resources in an ADX repo which include the following scopes:

repository
collection
record

These can be constructed using the adx URL (or the equivalent semantics).
Permissioning screens must give users a clear description of the resources being requested. This information cannot be provided by the application as this would represent an attack vector; therefore the descriptions of the resources must be fetched from a trusted source.
While the repository can be identified by the repo's asserted name (eg "bob.com" or "bob@example.com") the collections and records must be identified by outside information. In these cases, the collection's schema must be fetched and used to provide such a description.
Additional notes

Schema consistency

A consistent view of schema definitions across the network is important for ensuring compatibility. This has two effects on the design of ADX's schema system:

Schemas must use identifiers which are global in scope, and
Schema definitions must remain backwards compatible.

While global identifiers are provided with Schema IDs (which relies on DNS) there are presently no mechanisms to ensure backwards compatibility. It is incumbent on authors and consumers of schemas to ensure that schemas are properly maintained. Tools for publishing schemas are encouraged to validate schema-changes against the previous version to reduce errors.
Schema versioning

No formal mechanism for schema versions is defined. If a breaking change to a schema is required, authors are encouraged to publish the schema under a new name.
Note that records are published, addressed, and queried using their containing collection's type. This makes it trivial to interact with records of multiple types.
Operation without schemas

Schemas provide tooling to assist with correctness and compatibility. However, it is possible that the definitions do not remain available on the network. This means that systems may have to operate on records without access to the schemas. There is likewise the possible need for developers to locally override the schemas.
To counter-act this, the record encoding model defines a rich set of value types which provide some core semantics to the information. This ensures that records are easy to transact with in the absence of schema definitions.
Trust model

In the majority of cases, schemas are asserted by applications. This enables the application developers to download and verify the schemas before using them.
However, there are two known cases where schemas must be fetched from an authenticated network source:

Permission screens
General-purpose indexers

While fetching schemas from the ADX network does ensure their authenticity, it does not protect against malicious or erroneous actions by the schema publishers. For instance, the author of a collection schema could change the user-facing descriptions to confuse users; or, the author of a record schema could change the definition to cause indexers to struggle with parsing. At this time, no mitigations for these issues have been defined.
Compatibility with RDF

While not emphasized throughout the document, all semantics and behaviors in this proposal are derived from RDF. All information may be decomposed to RDF graph triples, and all terms are either directly equivalent to existing RDF vocabularies or easily translated to them.
The primary motivation of this choice is to enable ADX data and semantics to be expressed at the boundaries of ADX. External systems will frequently need to interact with the ADX networks, and the RDF model will enable ADX data to be encoded using JSON-LD, Turtle, and other RDF formats.
The secondary motivation is to enable graph-model databases to easily encode ADX data. Graph-triples provide a flexible and fine-grained view of information. These properties are especially useful for general-purpose indexers which ADX relies upon to provide aggregated views of the network.
Much of this proposal can be viewed as a DSL atop RDF. The goal of this DSL is not to support all possible RDF constructions. As a consequence, it is not always possible to encode existing RDF vocabularies in ADX. This was seen as an important tradeoff to achieve usability: by removing some features, we can enable developers to learn a small set of concepts and techniques before working productively.
Embedded RDF semantics and vocabulary

The underlying terms of ADX and ADXS are:

All terms identified under "Builtin data types".
All terms defined in this document.
rdfs:type Used to assert the schemas of records.
rdfs:Class Used to declare schemas.
rdfs:subClassOf Used to declare the parent class of schemas ("extends").
rdfs:comment The schema "comment".
sh:NodeShape Used to declare schemas.
sh:PropertyShape Used to declare schema properties.
sh:path Used to declare schema properties.
sh:minCount Expresses schema "minCount" and "required".
sh:maxCount Exresses schema "maxCount" and non-list relations.
sh:class Expresses schema "type" when using records.
sh:datatype Expresses schema "type" when using simple builtin datatypes.
sh:in Expresses schema "oneOf".
sh:minExclusive Expresses schema "minExclusive".
sh:maxExclusive Expresses schema "maxExclusive".
sh:minInclusive Expresses schema "minInclusive".
sh:maxInclusive Expresses schema "maxInclusive".
sh:minLength Expresses schema "minLength".
sh:maxLength Expresses schema "maxLength".
sh:pattern Expresses schema "pattern".
sh:defaultValue Expresses schema "defaultValue".

An ADX-Schema is a JSON document which encodes an RDF graph. Transformation rulsets enable ADXS documents to be converted to RDF triples. While the rulesets will require a full specification, the core principles are simple.
The example schema in the "ADX-Schema" section would look like this after transformation to Turtle:
@prefix :       <adx://example.com/adxs.org:Definitions/Post#> .
@prefix schema: <adx://adxs.org/adxs.org:Definitions/Schema#> .
@prefix prop:   <adx://adxs.org/adxs.org:Definitions/SchemaProp#> .
@prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sh:     <http://www.w3.org/ns/shacl#> .
@prefix xsd:    <http://www.w3.org/2001/XMLSchema#> .

: a schema: ;
  schema:comment "A little chirp." ;
  schema:props [
    a prop: ;
    prop:path :text ;
    prop:type xsd:string ;
    prop:required true ;
    prop:maxLength 255 ;
  ] ;
  schema:props [
    a prop: ;
    prop:path :extendedText ;
    prop:type xsd:string ;
  ] ;
  schema:props [
    a prop: ;
    prop:path :postedFrom ;
    prop:type <adx://gis.org/adxs.org:Definitions/Location> ;
    sh:maxCount 1 ;
  ] ;
  schema:props [
    a prop: ;
    prop:path :mentions ;
    prop:type <adx://adx.net/adxs.org:Definitions/User> ;
  ] .

Because the ADXS vocabulary maintains an equivalence to XSD, RDFS, & SHACL vocabularies, it is also to translate the documents to those more common terms:
@prefix :     <adx://example.com/adxs.org:Definitions/Post#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sh:   <http://www.w3.org/ns/shacl#> .
@prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .

: a rdfs:Class, sh:NodeShape ;
  rdfs:comment "A little chirp." ;
  sh:property [
    sh:path :text ;
    sh:datatype xsd:string ;
    sh:minCount 1 ;
    sh:maxCount 1 ;
    sh:maxlength 255 ;
  ] ;
  sh:property [
    sh:path :extendedText ;
    sh:datatype xsd:string ;
    sh:maxCount 1 ;
  ] ;
  sh:property [
    sh:path :postedFrom ;
    sh:class <adx://gis.org/adxs.org:Definitions/Location> ;
    sh:maxCount 1 ;
  ] ;
  sh:property [
    sh:path :mentions ;
    sh:class <adx://adx.net/adxs.org:Definitions/User> ;
  ] .


Future work

Blobs

TODO
Views and procedures

TODO
Appendix A. Datatype value interpretations

Data types are asserted by schemas while value types are asserted by records through the CBOR encoding. In the event of a mismatch, the value may be coerced using the following rules.

  
   null
   
   boolean
   
   integer
   
   float
   
   string
   
   list
   
   map
   
   datetime
   
   uri
   
  
   data type: any
   
   null
   
   boolean
   
   integer
   
   float
   
   string
   
   list
   
   map
   
   datetime
   
   uri
   
  
   data type: boolean
   
   null
   
   boolean
   
   0 → false; 1 → true
   
   null
   
   null
   
   null
   
   null
   
   null
   
   null
   
  
   data type: integer
   
   null
   
   false → 0; true → 1
   
   integer
   
   null
   
   null
   
   null
   
   null
   
   null
   
   null
   
  
   data type: float
   
   null
   
   false → 0.0; true → 1.0
   
   float
   
   float
   
   null
   
   null
   
   null
   
   null
   
   null
   
  
   data type: string
   
   null
   
   null
   
   null
   
   null
   
   string
   
   null
   
   null
   
   string
   
   string
   
  
   data type: duration
   
   null
   
   null
   
   duration (milliseconds)
   
   null
   
   null
   
   null
   
   null
   
   null
   
   null
   
  
   data type: datetime
   
   null
   
   null
   
   datetime (Unix epoch)
   
   null
   
   datetime (ISO-8601)
   
   null
   
   null
   
   datetime
   
   null
   
  
   data type: time
   
   null
   
   null
   
   time (Unix epoch)
   
   null
   
   time (ISO-8601)
   
   null
   
   null
   
   time
   
   null
   
  
   data type: date
   
   null
   
   null
   
   date (Unix epoch)
   
   null
   
   date (ISO-8601)
   
   null
   
   null
   
   date
   
   null
   
  
   data type: uri
   
   null
   
   null
   
   null
   
   null
   
   uri (RFC-6986)
   
   null
   
   null
   
   null
   
   uri
   
  
   data type: map
   
   null
   
   null
   
   null
   
   null
   
   null
   
   null
   
   map
   
   null
   
   null
   
  
   data type: list
   
   null
   
   null
   
   null
   
   null
   
   null
   
   list
   
   null
   
   null
   
   null
   
  
TODO: should strings support map<string> for language maps?
Appendix B. Builtin definitions

Note: the record shorthand maps to rdfs:Resource. All definitions inherit from this.
adxs.org:Definitions

{
  "name": "Definitions",
  "extends": "collection",
  "comment": "System definitions.",
}

adxs.org:Collection

Mapped to the collection shorthand in ADXS.
{
  "name": "Collection",
  "extends": "record",
  "comment": "A collection of records.",
}

adxs.org:Schema

{
  "name": "Schema",
  "extends": "record",
  "comment": "A type definition.",
  "props": {
    "name": {
      "type": "string",
      "required": true
    },
    "extends": "string",
    "comment": "string",
    "props": "adxs.org:SchemaProp[]"
  }
}

adxs.org:SchemaProp

{
  "name": "SchemaProp",
  "extends": "record",
  "comment": "A schema property definition.",
  "props": {
    "path": {
      "type": "string",
      "required": true
    }
    "type": {
      "type": "string",
      "required": true
    },
    "contains": "string",
    "required": "boolean",
    "minCount": "number",
    "maxCount": "number",
    "minLength": "number",
    "mimeType": "string|string[]",
    "pattern": "string",
    "oneOf": "any[]",
    "minInclusive": "number|string",
    "minExclusive": "number|string",
    "maxInclusive": "number|string",
    "maxExclusive": "number|string",
    "defaultValue": "any",
    "comment": "string"
  }
}
`null`	A CBOR simple value (major type 7, subtype 24) with a simple value of 22 (null).
`boolean`	A CBOR simple value (major type 7, subtype 24) with a simple value of 21 (true) or 20 (false).
`integer`	A CBOR integer (major type 0 or 1), choosing the shortest byte representation.
`float`	A CBOR floating-point number (major type 7). All floating point values MUST be encoded as 64-bits (additional type value 27), even for integral values.
`string`	A CBOR string (major type 3).
`list`	A CBOR array (major type 4), where each element of the list is added, in order, as a value of the array according to its type.
`map`	A CBOR map (major type 5), where each entry is represented as a member of the CBOR map. The entry key is expressed as a CBOR string (major type 3) as the key.
`datetime`	A CBOR datetime (major type 6, tag 0), an ISO-8601-formatted date-time string.
`uri`	A CBOR uri (major type 6, tag 32), an RFC-6986-formatted uri string.
Data type	Primary value type	RDF term
`any`	–	–
`boolean`	`boolean`	xsd:boolean
`integer`	`integer`	xsd:integer
`float`	`float`	xsd:double
`string`	`string`	xsd:string
`datetime`	`datetime`	xsd:dateTime
`date`	`datetime`	xsd:date
`time`	`datetime`	xsd:time
`uri`	`uri`	xsd:anyURI
Field	Type	Description
`type`	`string`	Declares the type of a record. Must be a valid Schema ID or URI.
Repository	`adx://bob.com`
Repository	`adx://bob@work.com`
Repository	`adx://did:ion:EiAnKD8-jfdd0MDcZUjAbRgaThBrMxPTFOxcnfJhI7Ukaw`
Collection	`adx://bob.com/example.com:songs`
Record	`adx://bob.com/example.com:songs/3yI5-c1z-cc2p-1a`
Keys	`string`	The "path" of the property.
Values	`string\|object`	A type string or a property definition object. See "Property attributes" for a description of property definition objects.
	null	boolean	integer	float	string	list	map	datetime	uri
data type: any	null	boolean	integer	float	string	list	map	datetime	uri
data type: boolean	null	boolean	0 → false; 1 → true	null	null	null	null	null	null
data type: integer	null	false → 0; true → 1	integer	null	null	null	null	null	null
data type: float	null	false → 0.0; true → 1.0	float	float	null	null	null	null	null
data type: string	null	null	null	null	string	null	null	string	string
data type: duration	null	null	duration (milliseconds)	null	null	null	null	null	null
data type: datetime	null	null	datetime (Unix epoch)	null	datetime (ISO-8601)	null	null	datetime	null
data type: time	null	null	time (Unix epoch)	null	time (ISO-8601)	null	null	time	null
data type: date	null	null	date (Unix epoch)	null	date (ISO-8601)	null	null	date	null
data type: uri	null	null	null	null	uri (RFC-6986)	null	null	null	uri
data type: map	null	null	null	null	null	null	map	null	null
data type: list	null	null	null	null	null	list	null	null	null