Dotted Canonical S-expressions - DCSexps
This is a specification for an s-expression interchange format that attempts to improve upon rivest's canonical s-expressions.
It is an output format for a subset of s-expressions. Those containing only pairs and atoms.
It was designed with the following desirable properties in mind:
- It has the canonicity property that
(EQUAL? A B)implies the DCS output of A is byte equal to the DCS output of B.
- It has the non-escaping property that arbitrary binary blobs can be contained as atoms without any processing. A consequence of this is that dcsexps can be nested easily.
- Simple to parse: It is much simpler to parse compared to rivest's canonical s-expressions because we use
The empty symbol (length 0) may be used as a stand-in for
<DCS> ::= <length> ':' <data[length]> | '.' <DCS> <DCS>
Why would you use this instead of regular s-expressions with the WRITE feature (that you could in theory turn off indentation, pretty printing to produce a function with the canonicity property)?
The value of this over that is that it is much more efficient in machine to machine interchange. For example between a web server and client.
Why would you use this instead of rivest canonical s-exps? It has a much simpler specification and the parsing algorithm is a fraction of the complexity.
This format is very raw, it only has pairs and atoms. We may need more data types. For that we can use tagged canonical s-exps.
<TCS> ::= <tag> <length> ':' <data[length]> | '.' <TCS> <TCS> <tag> ::= 'A' ;; Atom | 'S' ;; String | 'N' ;; Number | 'C' ;; Character: content must have length one. | 'B' ;; Boolean: content must be 't' or 'f' | 'Z' ;; nil (): content must have length 0
tag is a single character that explains which type to interpret the content of the atom as.