Skip to content

Instantly share code, notes, and snippets.

@OhMeadhbh
Last active August 29, 2015 14:06
Show Gist options
  • Save OhMeadhbh/89574f94b0afdec02298 to your computer and use it in GitHub Desktop.
Save OhMeadhbh/89574f94b0afdec02298 to your computer and use it in GitHub Desktop.
About SON of Dysfunction
; SonOfDysfunction.sod
;
; "SON of Dysfunction" is a serializable object notation (SON) built
; after the experimental "Dysfunction" transfer syntax. "SON of Dysfunction"
; (aka SOD) defines a transfer syntax, a type system, a set of XPath-ish macros
; to identify a nodes in transferred data structures and several recommended
; utility macros.
;
; The primary objective of SOD mostly to have a flexible object notation which
; worked like JSON but allowed comments. Like JSON, SOD targets systems with
; dynamic object systems, but can still be used with static systems with a
; little work. JSON is reverse compatible with SOD in that valid JSON strings
; are also valid SOD strings, but SOD makes a few enhancements:
;
; 1. Comments are totally doable (in fact, you're reading a comment right
; now.)
; 2. SOD defines 6 primitive types (null, boolean, integer, float, utf string
; and binary sequences) and 3 scalar types (vectors, arrays and maps.)
; 3. Non-String keys are a thing (also, string keys don't _have_ to be in
; quotes.)
; 4. The transfer syntax is distinct from the type system.
; 5. We add keywords to reset the state of the parser and to identify the
; type system the sender assumes the reader will use.
;
; *A Quick Gabfest on Transfer Syntax vs. Type System*
;
; When we say the transfer syntax is distinct from the type system, we mean the
; rules the parser uses to identify various bits of a SOD encoded string are
; distinct from the assumptions it uses about things like how many bits are in
; an integer or floating point number. So if you're parsing the string
; "4294967297" (i.e. 2^32 + 1) it's easy to see it obeys the rules of being a
; number, but if your underlying type system limits integers to 32 bit values,
; you've got a problem.
;
; It's easy to say "aw heck, just make all integers 64 bits," but this requires
; ALL systems that support your transfer system to grok 64 bit values. And wha
; about all the cryptography people who want to store integers as arbitrary
; length strings of octets. Krikey! To those people, it might be completely
; legit to have a number like "340282366920938463463374607431768211457".
;
; SOD avoids this problem by separating the parsing rules (what we call the
; transfer syntax) from the underlying type system. Because having sensible
; expectations of how a message receiver will deserialize values in messages
; is a "good thing(tm)" we define three type systems: small, medium and large.
;
; There's more on the type systems later, but for right now it's probably
; enough to know there are type systems targeting 8-bit microcontrollers,
; 32-bit mobile and desktop systems and big iron servers out in the cloud. If
; you've seen a SOD message, you might have seen an exclamation point at the
; beginning of the message followed by a number. This tells the receiver which
; type system it expects it to support:
;
; !0 - no type system (default)
; !1 - small type system
; !2 - medium type system
; !3 - large type system
;
; Oh, and one more thing you're going to see throughout this file, a bare
; exclamation point just means "reset the parser." If you guessed that
; changing the type system also resets the parser, you guessed right.
;
; *Back to the Discussion About Parsing*
;
; So the SOD Transfer Syntax is intended to tell the parser how to identify the
; beginning and end of different bits of serialized data. Like many other
; serialization formats, SOD messages consist of a series of tokens separated
; by whitespace, ocassionally with comments in-between tokens. We use the
; Unicode white space list to identify whitespace in SOD:
;
; 09 Control-I \t HT Horizontal Tab
; 0A Control-J \n LF Line Feed
; 0B Control-K \v VT Vertical Tab (does anyone actually use this?)
; 0C Control-L \f FF Form Feed
; 0D Control-M \r CR Carriage Return
; 20 ASCII Space Character
; 0085 NEXT-LINE (NEL)
; 00A0 NO-BREAK SPACE
; 1680 OGHAM SPACE MARK
; 2000 EN QUAD
; 2001 EM QUAD
; 2002 EN SPACE
; 2003 EM SPACE
; 2004 THREE-PER-EM SPACE
; 2005 FOUR-PER-EM SPACE
; 2006 SIX-PER-EM SPACE
; 2007 FIGURE SPACE
; 2008 PUNCTUATION SPACE
; 2009 THIN SPACE
; 200A HAIR SPACE
; 2028 LINE SEPARATOR
; 2029 PARAGRAPH SEPARATOR
; 202F NARROW NO-BREAK SPACE
; 205F MEDIUM MATHEMATICAL SPACE
; 3000 IDEOGRAPHIC SPACE
;
; SOD supports "to the end of line" comments starting with the characters:
; semi-colon (;), hash mark (#) and slash-slash digraph (//). Enclosed comments
; are opened with the slash-star digraph (/*) and closed with the star-slash
; digraph (*/).
;
; "End of Line" is identified by these characters:
; 0A Control-J \n LF Line Feed
; 0B Control-K \v VT Vertical Tab (does anyone actually use this?)
; 0C Control-L \f FF Form Feed
; 0D Control-M \r CR Carriage Return
; CR + LF
; 0085 NEXT-LINE (NEL)
; 2028 LINE SEPARATOR
; 2029 PARAGRAPH SEPARATOR
;
; *About the Abstract Type System*
;
; So you know how we said there were three type systems? We probably should
; have said there are three CONCRETE type systems. There's also an "abstract"
; type system that defines what types are available and what they're supposed
; to represent. The concrete type systems define how types in the abstract type
; system are stored in memory.
;
; The key thing we get from the abstract type system is we get types humans
; can reason about: numbers, booleans, strings, etc. The Abstract Type System
; defines 1 null type, 5 scalar types and 3 constructed types:
;
; Null Types:
; Null - CS equivalent of "this page intentionally left blank"
; Scalar Types:
; Boolean - True / False
; Integer - a counting number
; Floating Point - aka "REAL" numbers
; Unicode String - unicode encoded string
; Octet Sequence - sequence of octets / "binary data"
; Constructed Types:
; Vector - an array of scalar values, each with the same type
; Array - an array of values of any type (including arrays)
; Map - an array of values indexed by null or scalar values
;
; Null and Boolean literal values are limited to the case sensitive terminals:
;
; "nil" - a null value
; "false" - a false boolean
; "true" - a true boolean
;
; Integers are strings of digits with an optional dash at the beginning. Ex:
;
; Integers: 12, -14, 000018, -340282366920938463463374607431768211456
;
; Floating point values are like integers, but they have a decimal point. Ex:
;
; Floats: -3.1415928, 1.1, 0.15625, .7, -.3
;
; Floating point values can also be expressed in scientific notation. Ex:
;
; Moar Floats: 6.022E23, 9.0125E4, -7.297E-3
;
; Strings begin with a double-quote character and end with a non-escaped
; double quote:
;
; Strings: "I am a string", "the next string is a null string", ""
;
; If you need to put a double quote or standard control character in a string,
; escape it with the backslash character:
;
; "I am a \"string\" with a line-feed at the end.\n"
;
; Defined escape sequences are:
;bfnrt
; \" - quote character
; \\ - backslash character
; \0 - null (zero) character
; \b - backspace character (unicode code point 08)
; \f - form feed (unicode code point 0C)
; \n - line feed / new line (unicode code point 0A)
; \r - carriage return (unicode code point 0D)
; \t - tab character (unicode code point 09)
; 345678901234567890123456789012345678901234567890123456789012345678901234567890
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment