Skip to content

Instantly share code, notes, and snippets.

@fogus
Forked from rain-1/dcs.rkt
Created July 26, 2019 18:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save fogus/0e226926e1252d4169372c8cbf2b8b61 to your computer and use it in GitHub Desktop.
Save fogus/0e226926e1252d4169372c8cbf2b8b61 to your computer and use it in GitHub Desktop.
Dotted Canonical S-expressions - DCSexps
#lang racket
;; printing s-exps as DCS and TDCS, plus examples of what DCS and TDCS look like
(define (dcs l)
(cond ((pair? l)
(begin
(display ".")
(dcs (car l))
(dcs (cdr l))))
((null? l)
(display "0:"))
((symbol? l)
(display (string-length (symbol->string l)))
(display ":")
(display l))))
(define (tdcs l)
(cond ((pair? l)
(begin
(display ".")
(tdcs (car l))
(tdcs (cdr l))))
((null? l)
(display "Z0:"))
((symbol? l)
(display "A")
(display (string-length (symbol->string l)))
(display ":")
(display l))
((string? l)
(display "S")
(display (string-length l))
(display ":")
(display l))
))
(dcs '(function T get_x () (body (return (const 3)))))
(newline)
;; .8:function.1:T.5:get_x.0:..4:body..6:return..5:const.0:0:0:0:
(tdcs '(function T get_x () (body (return (const 3)))))
(newline)
;; .A8:function.A1:T.A5:get_x.Z0:..A4:body..A6:return..A5:const.Z0:Z0:Z0:Z0:
(tdcs '(var-decl string (foo "")))
(newline)
;; .A8:var-decl.A6:string..A3:foo.S0:Z0:Z0:

Dotted Canonical S-expressions - DCSexps

This is a specification for an s-expression interchange format that attempts to improve upon [2]rivest's canonical s-expressions.

It is an output format for a subset of s-expressions. Those containing only pairs and atoms.

It was designed with the following desirable properties in mind:

  • It has the canonicity property that (EQUAL? A B) implies the DCS output of A is byte equal to the DCS output of B.
  • It has the non-escaping property that arbitrary binary blobs can be contained as atoms without any processing. A consequence of this is that dcsexps can be nested easily.
  • Simple to parse: It is much simpler to parse compared to rivest's canonical s-expressions because we use . instead of ( and ).

The empty symbol (length 0) may be used as a stand-in for ().

<DCS> ::= <length> ':' <data[length]>
        | '.' <DCS> <DCS>

Rationale:

Why would you use this instead of regular s-expressions with the WRITE feature (that you could in theory turn off indentation, pretty printing to produce a function with the canonicity property)?

The value of this over that is that it is much more efficient in machine to machine interchange. For example between a web server and client.

Why would you use this instead of rivest canonical s-exps? It has a much simpler specification and the parsing algorithm is a fraction of the complexity.

Disadvantages

This format is very raw, it only has pairs and atoms. We may need more data types. For that we can use tagged canonical s-exps.

<TCS> ::= <tag> <length> ':' <data[length]>
        | '.' <TCS> <TCS>

<tag> ::= 'A'    ;; Atom
        | 'S'    ;; String
        | 'N'    ;; Number
        | 'C'    ;; Character: content must have length one.
        | 'B'    ;; Boolean: content must be 't' or 'f'
        | 'Z'    ;; nil (): content must have length 0

tag is a single character that explains which type to interpret the content of the atom as.

Related work

  • messagepack - "It's like JSON but fast an small"
  • bencode - netstring based format that can encode lists, used in bittorrent.
  • flatbuffers - interchange without any parsing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment