Skip to content

Instantly share code, notes, and snippets.

@tekknolagi
Forked from rain-1/dcs.rkt
Created October 24, 2018 16:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tekknolagi/15e5084f2919d6e002d50af44f3aaed7 to your computer and use it in GitHub Desktop.
Save tekknolagi/15e5084f2919d6e002d50af44f3aaed7 to your computer and use it in GitHub Desktop.
Dotted Canonical S-expressions - DCSexps

Dotted Canonical S-expressions - DCSexps

This is a specification for an s-expression interchange format that attempts to improve upon [2]rivest's canonical s-expressions.

It is an output format for a subset of s-expressions. Those containing only pairs and atoms.

It was designed with the following desirable properties in mind:

  • It has the canonicity property that (EQUAL? A B) implies the DCS output of A is byte equal to the DCS output of B.
  • It has the non-escaping property that arbitrary binary blobs can be contained as atoms without any processing. A consequence of this is that dcsexps can be nested easily.
  • Simple to parse: It is much simpler to parse compared to rivest's canonical s-expressions because we use . instead of ( and ).

The empty symbol (length 0) may be used as a stand-in for ().

<DCS> ::= <length> ':' <data[length]>
        | '.' <DCS> <DCS>

Rationale:

Why would you use this instead of regular s-expressions with the WRITE feature (that you could in theory turn off indentation, pretty printing to produce a function with the canonicity property)?

The value of this over that is that it is much more efficient in machine to machine interchange. For example between a web server and client.

Why would you use this instead of rivest canonical s-exps? It has a much simpler specification and the parsing algorithm is a fraction of the complexity.

Disadvantages

This format is very raw, it only has pairs and atoms. We may need more data types. For that we can use tagged canonical s-exps.

<TCS> ::= <tag> <length> ':' <data[length]>
        | '.' <TCS> <TCS>

<tag> ::= 'A'    ;; Atom
        | 'S'    ;; String
        | 'N'    ;; Number
        | 'C'    ;; Character: content must have length one.
        | 'B'    ;; Boolean: content must be 't' or 'f'
        | 'Z'    ;; nil (): content must have length 0

tag is a single character that explains which type to interpret the content of the atom as.

Related work

  • messagepack - "It's like JSON but fast an small"
  • bencode - netstring based format that can encode lists, used in bittorrent.
  • flatbuffers - interchange without any parsing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment