Please note: The following document is an initial draft proposal. All decisions are subject to change. Our present goal is to collect feedback and iterate upon this document. Please feel free to share your suggestions and concerns.
ADX is a federated network for distributing data. It leverages cryptographic signatures and hashes to distribute authenticity proofs along with the data, enabling each node to transact upon the data independently of the data's origin. ADX might therefore be described as an Internet-native database in which records are replicated across nodes.
As a consequence of relying on authenticity proofs, ADX must exchange "canonical" records. That is, ADX records must be transmitted in their original encoding and structure in order to validate the signatures and hashes which comprise the proofs. This stands in contrast to the RESTful model of the Web in which "representations" of records are exchanged and therefore may be constructed at the time of exchange. While ADX records may be stored and queried in a variety of forms, they must be transmitted in their canonical form.
The canonical form implies an encoding, layout, and underlying data model which is shared by all ADX nodes. Again, this stands in contrast to Internet applications which interoperate through messaging. ADX is a record-oriented network and must provide a sufficiently-general data model for a wide variety of applications. Developers will likely view ADX as a database within their software stack, and while records can be copied into other databases and systems, any records to be transmitted must be written in the canonical form to the ADX systems.
In addition to the data model of the canonical form, ADX applications must also agree upon the semantics of the exchanged records. This is generally referred to as the "schemas." In ADX, schemas inform the data model and visa-versa, therefore this document encompasses encoding, data model, and semantics.
Since the start of Bluesky, schemas have been highlighted by engineers both inside and outside the team as a lynch-pin to the project's success. This interest reflects many factors: the impact of schemas on developer experience, their relevance toward the evolvability of the network, and the high amount of opinion among SMEs in the space. Schemas are one of the most hotly debated topics among the community, and a good solution will consider as many of the known solutions as possible.
Three dominant philosophies have emerged in decentralized networks: global term definitions via RDF, convention-oriented freeform objects, and networked programs such as Ethereum's smart contracts. It's worth giving each a brief overview and discussing their strengths and weaknesses:
Global terms (RDF) | RDF is a highly-general model for creating unambiguous semantics. It uses a directed graph to organize all information into "triples" of facts. Many developers are only aware of RDF via JSON-LD, a format which provides an object-document abstraction over RDF while preserving the graph model.
RDF's strengths are its rigour, its standards-driven governance, its wide adoption in the Fediverse, and its flexibility. Its weaknesses are its complexity, poor DX, and unfamiliarity outside of certain developer niches. JSON-LD has demonstrated that well-designed tooling can overcome the weaknesses of RDF when consuming schemas, but authoring new schemas (vocabularies) remains a daunting task. |
Freeform objects | Convention-driven systems have a rich history in the indie-hacker culture and are often proposed as a solution to decentralized networks. These models typically use an object-document model and leave developers free to populate the document however they see fit, often with a few baked-in conventions such as indexing upon a "type" attribute.
The strengths of freeform objects include evolvability, flexibility, and ease-of-understanding. The weaknesses include the slow development of conventions, lack of coordination between separate teams/orgs, and frequent incompabilities between applications. Freeform objects often rely upon application libraries and can enable bazaar-style innovation. However, the innovation process can often be frustrating for end-users and developers as incompatibilities surface frequently and can be slow to resolve. |
Networked programs | Blockchain-based systems like Ethereum have recently advanced the use of a shared runtime which abstracts the network. Programs on the runtime (smart contracts) encapsulate state with a set of APIs which enforce schemas and business logic.
Networked programs benefit from their intuitive nature: developers can think of them like regular programs, or perhaps like Web APIs. The bytecode is publicly available and can be connected to the source-code to clearly explain their behavior. However the current models suffer from a great deal of runtime overhead, the gas-fee incentive to pre-optimize, and the existence of subtle complexities which lead to bugs. Bluesky is not using a blockchain, however there are interesting lessons to be learned from networked programs. Declarative, machine-readable definitions could be distributed over the network to instruct general-purpose nodes to enforce useful behaviors. |
This proposal builds upon RDF’s global terms with tooling inspired by the free form objects philosophy. The intent is to provide optional-but-recommended mechanics which assist developers without overly constraining them.
A slightly richer set of "value types" are used in the encoding of ADX records than is common for eg JSON. This enables ADX records to self-describe with some higher-level semantics, facilitating schema-free operation.
Schemas provide additional semantics, descriptions, constraints, and properties to ADX's value types. They assist in the interpretation and consistent usages of ADX records. Schemas may be published to the ADX network in a machine-readable form, enabling convenient distribution and access by software. However, most usages of schemas are optional, ensuring that the network remains flexible. Any software which depends on network-accessible schemas must additionally provide a fallback behavior if a schema is not available.
The semantics of schemas are based on RDF. This serves multiple goals: to benefit from the rigor of RDF, to leverage existing RDF software and techniques, and to ensure interoperability with systems outside of ADX. Many of the systems in this document can be described as a DSL over RDF.
- Schema evolvability. The schema system must enable developers to extend, evolve, and repurpose the network and its data.
- Ideally this should occur with minimal upfront social consensus – developers should not need to convince a "spec owner" to modify their schema in order to make changes.
- Developer convenience. Schemas and their tooling should be obvious, easy to use, and empowering.
- Tools should not overburden developers with strictness or busy-work. When there are guard-rails, those rails should inform the developer, not control them.
- We should recognize that the deployment of new schemas is a common task and cannot depend on slow-moving standards bodies.
- Reduced incompatibilities. Incompatibilities are a coordination challenge which emerge during independent development. Often they result in an inconsistent user experience: markup showing up in text, features missing from content (eg absent embedded media), or unexpected behaviors between applications. These issues affect users and place a burden upon developers to introduce features without creating compatibility issues. Tooling which helps with coordination between teams can improve evolvability as developers can more easily predict the effects of their software.
- Avoid NIH. There is a large body of prior work available for schemas. When possible, this system should leverage existing technology in order to benefit from its software, corpus of specifications, and expertise.
- When developing a novel solution, the reasoning for divergence should be clear and justified.
- Hash-friendly encodings. In order to validate authenticity proofs, ADX records must reliably serialize to a canonical form.
- While JSON has wide adoption, it fails to provide a canonical encoding without additional rules such as sorted keys, discarded duplicate keys, and string-encoded decimal numbers. This requirement demands either a modified JSON encoding or some other encoding format.
- Unambiguous terms. Applications should agree upon which values are being shared.
- Ambiguous terms – eg keynames in documents which are not well-defined – risk creating collisions which are difficult to resolve (and often difficult to detect and debug).
- User-friendly descriptions. ADX software must provide UIs such as permission prompts which describe the data being affected.
- Secure trust model. Any schema information must have a clear and secure trust model.
- This is trivial for most applications as they choose which schemas to integrate and then act upon the records according to their own validation. However, in some situations the schemas are chosen by third-parties such as when providing permission prompts, enabling an application to misrepresent actions to the user or to the system. All usages of schemas must consider the effects of malicious actors.
ADX records are encoded using CBOR.
"Value types" establish the kinds of values in the data model.
The data model supports a subset of CBOR's available value types:
null
|
A CBOR simple value (major type 7, subtype 24) with a simple value of 22 (null). |
boolean
|
A CBOR simple value (major type 7, subtype 24) with a simple value of 21 (true) or 20 (false). |
integer
|
A CBOR integer (major type 0 or 1), choosing the shortest byte representation. |
float
|
A CBOR floating-point number (major type 7). All floating point values MUST be encoded as 64-bits (additional type value 27), even for integral values. |
string
|
A CBOR string (major type 3). |
list
|
A CBOR array (major type 4), where each element of the list is added, in order, as a value of the array according to its type. |
map
|
A CBOR map (major type 5), where each entry is represented as a member of the CBOR map. The entry key is expressed as a CBOR string (major type 3) as the key. |
datetime
|
A CBOR datetime (major type 6, tag 0), an ISO-8601-formatted date-time string. |
uri
|
A CBOR uri (major type 6, tag 32), an RFC-6986-formatted uri string. |
TODO: do we need uint53, int54?
TODO: do we need bignums?
TODO: do we need binary? I'm inclined to say no: we face problems when people stuff binary into records rather than using blobs
"Data types" establish the kinds of properties in the data model. They are used in schemas to define how properties should be interpreted.
Every data type defines an interpretation for each value type. In cases where no useful interpretation can be created, the interpretation is mapped to Null. For example, a DateTime property can be validly be set to the String and Null values; for all other values, the interpretation resorts to Null.
All data types (builtin and user-defined) follow the RDF model. Consequently, the type-identifiers for the builtin simple types and user-defined record types expand to URIs. They are mapped to shortened terms for use in records.
The data model supports a set of simple data types:
Data type | Primary value type | RDF term |
any
|
– | – |
boolean
|
boolean
|
xsd:boolean |
integer
|
integer
|
xsd:integer |
float
|
float
|
xsd:double |
string
|
string
|
xsd:string |
datetime
|
datetime
|
xsd:dateTime |
date
|
datetime
|
xsd:date |
time
|
datetime
|
xsd:time |
uri
|
uri
|
xsd:anyURI |
The data model also supports a set of complex types:
Data type | Primary value type | RDF term | Description |
record
|
map
|
rdfs:Resource | A key/value document. |
list
|
list
|
rdf:Seq | An ordered array. |
The complex types can contain all other simple or complex types. As explained in "Data layout," all information is published in records, and records can contain records.
Records may be assigned custom types. New record types are created by publishing a schema.
A record type is any valid URI. Tools may attempt to download a machine-readable schema from the URI, but this is not required.
Records contain the following standard fields:
Field | Type | Description |
type
|
string
|
Declares the type of a record. Must be a valid Schema ID or URI. |
The data layout establishes the units of network-transmissable data. It includes the following three groupings:
- Repository. The dataset of a single actor; contains a set of collections.
- Collection. An ordered list of records.
- Record. A key/value document.
These groupings establish addressability as well as the available network queries. For instance, a Repository is addressed by its DID and can be fetched in its entirety, while a Collection is addressed by a DID + its ID and can be fetched partially with range queries. It is not possible to transmit smaller units of data than these three groupings; for instance, a subset of a record cannot be requested over the network.
Additional properties and behaviors for each grouping are defined below.
Repositories are the dataset of a single "actor" (ie user) in the ADX network. Every user has a single repository which is identified by a DID.
A collection is an ordered list of records. Every collection has a type and is identified by the Schema ID of its type. Collections may contain records of any type and cannot enforce any constraints on them.
A record is a key/value document. It is the smallest unit of data which can be transmitted over the network. Every record has a type and is identified by a key which is chosen by the writing software.
The builtin "Definitions collection," identified by adxs.org:Definitions
, is used to store schema definitions.
Schemas are documents which declare new types. They define:
- Semantic meanings,
- Descriptive metadata,
- Shape-constraints, and
- Behavior hints.
The primary purpose of schemas is to help developers reach consensus on how they interact on the system. Their secondary purpose is to provide tooling which reduces bugs and incompatibilities, however most tooling is chosen by applications and is therefore optional.
All schemas are published as records in the builtin adxs.org:Definitions
collection. This makes it possible to reference schemas using only the repository name and schema keyname. We call this the "Schema ID".
schema-id = repo-name ":" schema-name
repo-name = [ reg-name "@" ] reg-name
schema-name = reg-name
reg-name
is defined in https://www.rfc-editor.org/rfc/rfc3986#section-3.2.2.
For example, the schema ID of example.com:Song
can be found in the adx:example.com
repository in the def collection under the song
key.
The adx
URL scheme is used to address records in the ADX network.
adx-url = "adx://" authority path [ "?" query ] [ "#" fragment ]
authority = repo-name / did
repo-name = [ reg-name "@" ] reg-name
path = [ "/" schema-id [ "/" record-id ] ]
coll-ns = reg-name
coll-id = 1*pchar
record-id = 1*pchar
did
is defined in https://w3c.github.io/did-core/#did-syntax.
reg-name
is defined in https://www.rfc-editor.org/rfc/rfc3986#section-3.2.2.
pchar
is defined in https://www.rfc-editor.org/rfc/rfc3986#section-3.3.
query
is defined in https://www.rfc-editor.org/rfc/rfc3986#section-3.4.
fragment
is defined in https://www.rfc-editor.org/rfc/rfc3986#section-3.5.
schema-id
is defined in "Schema IDs."
The fragment segment only has meaning if the URL references a record. Its value maps to a subrecord with the matching "id"
value.
Some example adx
URLs:
Repository | adx://bob.com
|
Repository | adx://bob@work.com
|
Repository | adx://did:ion:EiAnKD8-jfdd0MDcZUjAbRgaThBrMxPTFOxcnfJhI7Ukaw
|
Collection | adx://bob.com/example.com:songs
|
Record | adx://bob.com/example.com:songs/3yI5-c1z-cc2p-1a
|
ADX-Schema (or "ADXS") is a DSL for schemas in ADX.
It can be helpful to start from an example:
{
"name": "Post",
"extends": "record",
"comment": "A little chirp.",
"props": {
"text": {
"type": "string",
"required": true,
"maxLength": 255
},
"extendedText": "string",
"postedFrom": "gis.org:Location",
"mentions": "adx.net:User[]"
}
}
The schema above defines a record-type named "Post." When published, its ID will combine the repo name with the schema name, eg example.com:Post
.
The post record defines a set of properties, each with a type. Let's look at each in detail:
text
Uses the builtin string type with a length constraint. It also declares that the field is required.extendedText
Uses the builtin string type with no extra constraints.postedFrom
Uses a record with a custom type which is imported from another schema.mentions
Uses a list of records with a custom type which is also imported from another schema.
Schemas are polymorphic, meaning they can extend existing schemas to add or redefine constraints. TODO: what are the rules for "overwriting" parent schema definitions?
Properties take advantage of polymorphism, meaning that properties will accept values of the given type or their child types.
Not all types are extensible. The types which may be extended are:
record
A key/value document.collection
An ordered list of records.view
A network endpoint which provides views of the network data. To be described in a future document.procedure
A network endpoint which provides effectful operations. To be described in a future document.
Consequently, all schemas extend from these base types or a subtype of them.
The structure of an ADXS document depends on the base type. The attributes and their interpretation are described in the following sub-sections.
Schema objects may contain the following fields:
Field | Type | Description | Applies to |
name
|
string
|
The name of the schema. | any |
extends
|
string
|
The base type of the schema. May be "record", "collection", "view", "procedure", or the Schema ID of an existing schema. | any |
comment
|
string
|
A description of the schema. | any |
props
|
Properties map | A map of properties which can be included in the record or view and their definitions. | record, view |
The properties map enumerates a list of properties and their definitions. It is used in record and view schemas.
Keys | string
|
The "path" of the property. |
Values | string|object
|
A type string or a property definition object. See "Property attributes" for a description of property definition objects. |
Type strings follow the following format:
type = ( type-id [ "[]" ] / URI )
type-id = "any" / "boolean" / "integer" / "float" / "string" / "datetime" / "date" / "time" / "uri" / "record" / schema-id / "null"
reg-name
is defined in https://www.rfc-editor.org/rfc/rfc3986#section-3.2.2.
URI
is defined in https://www.rfc-editor.org/rfc/rfc3986#section-3.
schema-id
is defined in "Schema ID".
Type strings are interpreted with the following rules:
- The
type-id
segment maps to a builtin datatype or a user-defined datatype. - The
[]
postfix indicates that the type is a list. - The
URI
indicates an RDF vocabulary definition.
Property objects may contain the following fields:
Field | Type | Description | Applies to |
type
|
string|string[]
|
The type or types of the property. | any |
contains
|
string
|
The type of the contained property. Follows the rules of `type`. | list |
required
|
boolean
|
Do records need to specify a value for this property? | any |
minCount
|
number
|
The minimum number of values in the list. | list |
maxCount
|
number
|
The maximum number of values in the list. | list |
minLength
|
number
|
The minimum length of the string. | string |
maxLength
|
number
|
The maximum length of the string. | string |
mimeType
|
string|string[]
|
The supported MIME types of the value. | string |
pattern
|
string
|
A regex defining valid values of the string. | string |
oneOf
|
any[]
|
A list of valid values. | integer, string, integer[], string[] |
minInclusive
|
number|string
|
The minimum value, inclusive. | integer, float, date, time, datetime, duration |
minExclusive
|
number|string
|
The minimum value, exclusive. | integer, float, date, time, datetime, duration |
maxInclusive
|
number|string
|
The maximum value, inclusive. | integer, float, date, time, datetime, duration |
maxExclusive
|
number|string
|
The maximum value, exclusive. | integer, float, date, time, datetime, duration |
defaultValue
|
any
|
A default value to assign the property if none is provided. | any |
comment
|
string
|
A description of the property. | any |
TODO: need a way to express per-type constraints when multiple types are supported
TODO: hints about behaviors such as value indexing?
Schemas are published in ADX repos in the builtin adxs.org:Definitions
collection.
ADX-Schemas are published in a record of the adxs.org:Schema
type. This requires some structure modification; for instance, the "Properties map" must be transformed into an array form as ADX records do not support "map" constructs.
The Schema-ID of a schema can be constructed once published according to the rules defined in "Schema IDs". For instance, a schema record published at adx://example.com/adxs.org:Definitions/Thing
will have the Schema-ID of example.com:Thing
.
Schemas are expected to be kept available by their authors. If hosting is not maintained, developers and systems will be unable to access the schema definition and will fallback to schema-less behaviors. This will degrade the user experience and therefore schema-authors should be conscious of their obligation to continue hosting.
To improve the reliability of schema hosting, it's recommended to operate "schema management" services to which authors can submit their schemas. This will improve availability and can enable some additional validation to be enforced.
Schemas are referenced by a Schema ID in ADX applications.
The builtin behaviors around schema consumption are kept minimal to ensure ADX is flexible and tolerant of unavailable schema definitions (see "Operation without schemas").
Applications can download schemas using tools similar to software package managers. These schemas can be stored in the app's software repository and leveraged by libraries to provide additional behaviors. Suggested behaviors include:
- Write validation which errors when a record does not conform to the schema.
- Read validation with configurable behaviors for non-conforming records (skip, warn, error, ignore).
- Read coercion which interprets value types into the schema's asserted data types.
The ADX ecosystem includes a capabilities-model of permissioning built on UCANs. The UCANs require a string format for identifying resources in an ADX repo which include the following scopes:
- repository
- collection
- record
These can be constructed using the adx
URL (or the equivalent semantics).
Permissioning screens must give users a clear description of the resources being requested. This information cannot be provided by the application as this would represent an attack vector; therefore the descriptions of the resources must be fetched from a trusted source.
While the repository can be identified by the repo's asserted name (eg "bob.com" or "bob@example.com") the collections and records must be identified by outside information. In these cases, the collection's schema must be fetched and used to provide such a description.
A consistent view of schema definitions across the network is important for ensuring compatibility. This has two effects on the design of ADX's schema system:
- Schemas must use identifiers which are global in scope, and
- Schema definitions must remain backwards compatible.
While global identifiers are provided with Schema IDs (which relies on DNS) there are presently no mechanisms to ensure backwards compatibility. It is incumbent on authors and consumers of schemas to ensure that schemas are properly maintained. Tools for publishing schemas are encouraged to validate schema-changes against the previous version to reduce errors.
No formal mechanism for schema versions is defined. If a breaking change to a schema is required, authors are encouraged to publish the schema under a new name.
Note that records are published, addressed, and queried using their containing collection's type. This makes it trivial to interact with records of multiple types.
Schemas provide tooling to assist with correctness and compatibility. However, it is possible that the definitions do not remain available on the network. This means that systems may have to operate on records without access to the schemas. There is likewise the possible need for developers to locally override the schemas.
To counter-act this, the record encoding model defines a rich set of value types which provide some core semantics to the information. This ensures that records are easy to transact with in the absence of schema definitions.
In the majority of cases, schemas are asserted by applications. This enables the application developers to download and verify the schemas before using them.
However, there are two known cases where schemas must be fetched from an authenticated network source:
- Permission screens
- General-purpose indexers
While fetching schemas from the ADX network does ensure their authenticity, it does not protect against malicious or erroneous actions by the schema publishers. For instance, the author of a collection schema could change the user-facing descriptions to confuse users; or, the author of a record schema could change the definition to cause indexers to struggle with parsing. At this time, no mitigations for these issues have been defined.
While not emphasized throughout the document, all semantics and behaviors in this proposal are derived from RDF. All information may be decomposed to RDF graph triples, and all terms are either directly equivalent to existing RDF vocabularies or easily translated to them.
The primary motivation of this choice is to enable ADX data and semantics to be expressed at the boundaries of ADX. External systems will frequently need to interact with the ADX networks, and the RDF model will enable ADX data to be encoded using JSON-LD, Turtle, and other RDF formats.
The secondary motivation is to enable graph-model databases to easily encode ADX data. Graph-triples provide a flexible and fine-grained view of information. These properties are especially useful for general-purpose indexers which ADX relies upon to provide aggregated views of the network.
Much of this proposal can be viewed as a DSL atop RDF. The goal of this DSL is not to support all possible RDF constructions. As a consequence, it is not always possible to encode existing RDF vocabularies in ADX. This was seen as an important tradeoff to achieve usability: by removing some features, we can enable developers to learn a small set of concepts and techniques before working productively.
The underlying terms of ADX and ADXS are:
- All terms identified under "Builtin data types".
- All terms defined in this document.
- rdfs:type Used to assert the schemas of records.
- rdfs:Class Used to declare schemas.
- rdfs:subClassOf Used to declare the parent class of schemas ("extends").
- rdfs:comment The schema "comment".
- sh:NodeShape Used to declare schemas.
- sh:PropertyShape Used to declare schema properties.
- sh:path Used to declare schema properties.
- sh:minCount Expresses schema "minCount" and "required".
- sh:maxCount Exresses schema "maxCount" and non-list relations.
- sh:class Expresses schema "type" when using records.
- sh:datatype Expresses schema "type" when using simple builtin datatypes.
- sh:in Expresses schema "oneOf".
- sh:minExclusive Expresses schema "minExclusive".
- sh:maxExclusive Expresses schema "maxExclusive".
- sh:minInclusive Expresses schema "minInclusive".
- sh:maxInclusive Expresses schema "maxInclusive".
- sh:minLength Expresses schema "minLength".
- sh:maxLength Expresses schema "maxLength".
- sh:pattern Expresses schema "pattern".
- sh:defaultValue Expresses schema "defaultValue".
An ADX-Schema is a JSON document which encodes an RDF graph. Transformation rulsets enable ADXS documents to be converted to RDF triples. While the rulesets will require a full specification, the core principles are simple.
The example schema in the "ADX-Schema" section would look like this after transformation to Turtle:
@prefix : <adx://example.com/adxs.org:Definitions/Post#> .
@prefix schema: <adx://adxs.org/adxs.org:Definitions/Schema#> .
@prefix prop: <adx://adxs.org/adxs.org:Definitions/SchemaProp#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
: a schema: ;
schema:comment "A little chirp." ;
schema:props [
a prop: ;
prop:path :text ;
prop:type xsd:string ;
prop:required true ;
prop:maxLength 255 ;
] ;
schema:props [
a prop: ;
prop:path :extendedText ;
prop:type xsd:string ;
] ;
schema:props [
a prop: ;
prop:path :postedFrom ;
prop:type <adx://gis.org/adxs.org:Definitions/Location> ;
sh:maxCount 1 ;
] ;
schema:props [
a prop: ;
prop:path :mentions ;
prop:type <adx://adx.net/adxs.org:Definitions/User> ;
] .
Because the ADXS vocabulary maintains an equivalence to XSD, RDFS, & SHACL vocabularies, it is also to translate the documents to those more common terms:
@prefix : <adx://example.com/adxs.org:Definitions/Post#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
: a rdfs:Class, sh:NodeShape ;
rdfs:comment "A little chirp." ;
sh:property [
sh:path :text ;
sh:datatype xsd:string ;
sh:minCount 1 ;
sh:maxCount 1 ;
sh:maxlength 255 ;
] ;
sh:property [
sh:path :extendedText ;
sh:datatype xsd:string ;
sh:maxCount 1 ;
] ;
sh:property [
sh:path :postedFrom ;
sh:class <adx://gis.org/adxs.org:Definitions/Location> ;
sh:maxCount 1 ;
] ;
sh:property [
sh:path :mentions ;
sh:class <adx://adx.net/adxs.org:Definitions/User> ;
] .
TODO
TODO
Data types are asserted by schemas while value types are asserted by records through the CBOR encoding. In the event of a mismatch, the value may be coerced using the following rules.
null | boolean | integer | float | string | list | map | datetime | uri | |
data type: any | null | boolean | integer | float | string | list | map | datetime | uri |
data type: boolean | null | boolean | 0 → false; 1 → true | null | null | null | null | null | null |
data type: integer | null | false → 0; true → 1 | integer | null | null | null | null | null | null |
data type: float | null | false → 0.0; true → 1.0 | float | float | null | null | null | null | null |
data type: string | null | null | null | null | string | null | null | string | string |
data type: duration | null | null | duration (milliseconds) | null | null | null | null | null | null |
data type: datetime | null | null | datetime (Unix epoch) | null | datetime (ISO-8601) | null | null | datetime | null |
data type: time | null | null | time (Unix epoch) | null | time (ISO-8601) | null | null | time | null |
data type: date | null | null | date (Unix epoch) | null | date (ISO-8601) | null | null | date | null |
data type: uri | null | null | null | null | uri (RFC-6986) | null | null | null | uri |
data type: map | null | null | null | null | null | null | map | null | null |
data type: list | null | null | null | null | null | list | null | null | null |
TODO: should strings support map<string> for language maps?
Note: the record
shorthand maps to rdfs:Resource. All definitions inherit from this.
{
"name": "Definitions",
"extends": "collection",
"comment": "System definitions.",
}
Mapped to the collection
shorthand in ADXS.
{
"name": "Collection",
"extends": "record",
"comment": "A collection of records.",
}
{
"name": "Schema",
"extends": "record",
"comment": "A type definition.",
"props": {
"name": {
"type": "string",
"required": true
},
"extends": "string",
"comment": "string",
"props": "adxs.org:SchemaProp[]"
}
}
{
"name": "SchemaProp",
"extends": "record",
"comment": "A schema property definition.",
"props": {
"path": {
"type": "string",
"required": true
}
"type": {
"type": "string",
"required": true
},
"contains": "string",
"required": "boolean",
"minCount": "number",
"maxCount": "number",
"minLength": "number",
"mimeType": "string|string[]",
"pattern": "string",
"oneOf": "any[]",
"minInclusive": "number|string",
"minExclusive": "number|string",
"maxInclusive": "number|string",
"maxExclusive": "number|string",
"defaultValue": "any",
"comment": "string"
}
}
I saw some messages on Trust over IP Foundation slack about this draft, so I read it and wanted to share some of the experiences we had
while developing the layered schemas project (https://layeredschemas.org), and applying it to semantic interoperability problems for health data. Here, your aim appears to be mostly structural interoperability. We are working on semantic interoperability where
<height>160cm</height>
and{"h":1.6, "unit":"m"}
would be considered equal. My comments are mainly on the schema language itself.Schema evolvability and extensions: Our approach to this is using schema overlays. Such overlays can add/remove fields, modify semantics of the underlying schema, and in general, adjusts the schema to fit to a particular use case. This is similar to what you call "inheritance", however, we call it "schema composition", and the result is a "schema variant".
Developer convenience: Working with JSON-LD and RDF is not easy. Conversely, JSON schemas are ubiquitous. So instead of limiting
the schema language to a JSON-LD based syntax, we decided to support JSON schemas for schemas and overlays. This allows using already available and standardized schemas to be incorporated into the ecosystem. An existing JSON schema can be extended using overlays to add annotations, semantics, or different data types for fields. Something similar can be done for your schemas. Early in the process, we also switched to using labeled property graphs instead of RDF.
Hash-friendly encodings: Layered schemas include an "attributeIndex" annotation for each field. When we ingest data, we create a graph including these annotations and then reconstruct it consistently. Something similar can be adopted.
Value types: There are multiple conventions around how data elements are represented. If one system operates with Unix epoch-style
timestamps and another uses RFC3339, you cannot really interoperate. Because of this, we decided to not require data types in schema specifications. We perform type coercions/translations based on a known set of data types including the xsd: namespace and json types/formats when we need to translate data between different variants. The type system remains extensible though, for instance, we have a type for "Measurement"s composed of a value and a unit.
The required data types in our case are "structural" types: "Value", "Object", "Array", "Polymorphic". A "Value" contains an array of bytes. An "Object" contains an unordered set of elements, etc.
In our case, a "Polymorphic" data type refers to truly polymorphic data, that is, a data element that can be one of several types. That
appears to be lacking in your case.
We also support "Reference" data types. These are simply fields that are already defined by an existing external schema.