tekknolagi/specification.md

## specification.md

      
    Raw
  

              specification.md
            
          
    Dotted Canonical S-expressions - DCSexps

This is a specification for an s-expression interchange format that attempts to improve upon [2]rivest's canonical s-expressions.
It is an output format for a subset of s-expressions. Those containing only pairs and atoms.
It was designed with the following desirable properties in mind:

It has the canonicity property that (EQUAL? A B) implies the DCS output of A is byte equal to the DCS output of B.
It has the non-escaping property that arbitrary binary blobs can be contained as atoms without any processing. A consequence of this is that dcsexps can be nested easily.
Simple to parse: It is much simpler to parse compared to rivest's canonical s-expressions because we use . instead of ( and ).

The empty symbol (length 0) may be used as a stand-in for ().
<DCS> ::= <length> ':' <data[length]>
        | '.' <DCS> <DCS>


[1] https://cr.yp.to/proto/netstrings.txt
[2] https://people.csail.mit.edu/rivest/Sexp.txt


Rationale:

Why would you use this instead of regular s-expressions with the WRITE feature (that you could in theory turn off indentation, pretty printing to produce a function with the canonicity property)?
The value of this over that is that it is much more efficient in machine to machine interchange. For example between a web server and client.
Why would you use this instead of rivest canonical s-exps? It has a much simpler specification and the parsing algorithm is a fraction of the complexity.
Disadvantages

This format is very raw, it only has pairs and atoms. We may need more data types. For that we can use tagged canonical s-exps.
<TCS> ::= <tag> <length> ':' <data[length]>
        | '.' <TCS> <TCS>

<tag> ::= 'A'    ;; Atom
        | 'S'    ;; String
        | 'N'    ;; Number
        | 'C'    ;; Character: content must have length one.
        | 'B'    ;; Boolean: content must be 't' or 'f'
        | 'Z'    ;; nil (): content must have length 0

tag is a single character that explains which type to interpret the content of the atom as.
Related work


messagepack - "It's like JSON but fast an small"
bencode - netstring based format that can encode lists, used in bittorrent.
flatbuffers - interchange without any parsing