OhMeadhbh/SonOfDysfunction.sod

## SonOfDysfunction.sod
; SonOfDysfunction.sod
;
; "SON of Dysfunction" is a serializable object notation (SON) built
; after the experimental "Dysfunction" transfer syntax. "SON of Dysfunction"
; (aka SOD) defines a transfer syntax, a type system, a set of XPath-ish macros
; to identify a nodes in transferred data structures and several recommended
; utility macros.
;
; The primary objective of SOD mostly to have a flexible object notation which
; worked like JSON but allowed comments. Like JSON, SOD targets systems with
; dynamic object systems,  but can still be used with static systems with a
; little work. JSON is reverse compatible with SOD in that valid JSON strings
; are also valid SOD strings, but SOD makes a few enhancements:
;
;   1. Comments are totally doable (in fact, you're reading a comment right
;      now.)
;   2. SOD defines 6 primitive types (null, boolean, integer, float, utf string
;      and binary sequences) and 3 scalar types (vectors, arrays and maps.)
;   3. Non-String keys are a thing (also, string keys don't _have_ to be in
;      quotes.)
;   4. The transfer syntax is distinct from the type system.
;   5. We add keywords to reset the state of the parser and to identify the
;      type system the sender assumes the reader will use.
;
; *A Quick Gabfest on Transfer Syntax vs. Type System*
;
; When we say the transfer syntax is distinct from the type system, we mean the
; rules the parser uses to identify various bits of a SOD encoded string are
; distinct from the assumptions it uses about things like how many bits are in
; an integer or floating point number. So if you're parsing the string
; "4294967297" (i.e. 2^32 + 1) it's easy to see it obeys the rules of being a
; number, but if your underlying type system limits integers to 32 bit values,
; you've got a problem.
;
; It's easy to say "aw heck, just make all integers 64 bits," but this requires
; ALL systems that support your transfer system to grok 64 bit values. And wha
; about all the cryptography people who want to store integers as arbitrary
; length strings of octets. Krikey! To those people, it might be completely
; legit to have a number like "340282366920938463463374607431768211457".
;
; SOD avoids this problem by separating the parsing rules (what we call the
; transfer syntax) from the underlying type system. Because having sensible
; expectations of how a message receiver will deserialize values in messages
; is a "good thing(tm)" we define three type systems: small, medium and large.
;
; There's more on the type systems later, but for right now it's probably
; enough to know there are type systems targeting 8-bit microcontrollers,
; 32-bit mobile and desktop systems and big iron servers out in the cloud. If
; you've seen a SOD message, you might have seen an exclamation point at the
; beginning of the message followed by a number. This tells the receiver which
; type system it expects it to support:
;
;   !0 - no type system (default)
;   !1 - small type system
;   !2 - medium type system
;   !3 - large type system
;
; Oh, and one more thing you're going to see throughout this file, a bare
; exclamation point just means "reset the parser." If you guessed that
; changing the type system also resets the parser, you guessed right.
;
; *Back to the Discussion About Parsing*
;
; So the SOD Transfer Syntax is intended to tell the parser how to identify the
; beginning and end of different bits of serialized data. Like many other
; serialization formats, SOD messages consist of a series of tokens separated
; by whitespace, ocassionally with comments in-between tokens. We use the
; Unicode white space list to identify whitespace in SOD:
;
;   09          Control-I \t HT Horizontal Tab
;   0A          Control-J \n LF Line Feed
;   0B          Control-K \v VT Vertical Tab (does anyone actually use this?)
;   0C          Control-L \f FF Form Feed
;   0D          Control-M \r CR Carriage Return
;   20          ASCII Space Character
; 0085          NEXT-LINE (NEL)
; 00A0          NO-BREAK SPACE
; 1680          OGHAM SPACE MARK
; 2000          EN QUAD
; 2001          EM QUAD
; 2002          EN SPACE
; 2003          EM SPACE
; 2004          THREE-PER-EM SPACE
; 2005          FOUR-PER-EM SPACE
; 2006          SIX-PER-EM SPACE
; 2007          FIGURE SPACE
; 2008          PUNCTUATION SPACE
; 2009          THIN SPACE
; 200A          HAIR SPACE
; 2028          LINE SEPARATOR
; 2029          PARAGRAPH SEPARATOR
; 202F          NARROW NO-BREAK SPACE
; 205F          MEDIUM MATHEMATICAL SPACE
; 3000          IDEOGRAPHIC SPACE
;
; SOD supports "to the end of line" comments starting with the characters:
; semi-colon (;), hash mark (#) and slash-slash digraph (//). Enclosed comments
; are opened with the slash-star digraph (/*) and closed with the star-slash
; digraph (*/).
;
; "End of Line" is identified by these characters:
;   0A          Control-J \n LF Line Feed
;   0B          Control-K \v VT Vertical Tab (does anyone actually use this?)
;   0C          Control-L \f FF Form Feed
;   0D          Control-M \r CR Carriage Return
;               CR + LF
; 0085          NEXT-LINE (NEL)
; 2028          LINE SEPARATOR
; 2029          PARAGRAPH SEPARATOR
;
; *About the Abstract Type System*
;
; So you know how we said there were three type systems? We probably should
; have said there are three CONCRETE type systems. There's also an "abstract"
; type system that defines what types are available and what they're supposed
; to represent. The concrete type systems define how types in the abstract type
; system are stored in memory.
;
; The key thing we get from the abstract type system is we get types humans
; can reason about: numbers, booleans, strings, etc. The Abstract Type System
; defines 1 null type, 5 scalar types and 3 constructed types:
;
; Null Types:
;   Null              - CS equivalent of "this page intentionally left blank"
; Scalar Types:
;   Boolean           - True / False
;   Integer           - a counting number
;   Floating Point    - aka "REAL" numbers
;   Unicode String    - unicode encoded string
;   Octet Sequence    - sequence of octets / "binary data"
; Constructed Types:
;   Vector            - an array of scalar values, each with the same type
;   Array             - an array of values of any type (including arrays)
;   Map               - an array of values indexed by null or scalar values
;
; Null and Boolean literal values are limited to the case sensitive terminals:
;
;   "nil"    - a null value
;   "false"  - a false boolean
;   "true"   - a true boolean
;
; Integers are strings of digits with an optional dash at the beginning. Ex:
;
;   Integers: 12, -14, 000018, -340282366920938463463374607431768211456
;
; Floating point values are like integers, but they have a decimal point. Ex:
;
;   Floats: -3.1415928, 1.1, 0.15625, .7, -.3
;
; Floating point values can also be expressed in scientific notation. Ex:
;
;   Moar Floats: 6.022E23, 9.0125E4, -7.297E-3
;
; Strings begin with a double-quote character and end with a non-escaped
; double quote:
;
;   Strings: "I am a string", "the next string is a null string", ""
;
; If you need to put a double quote or standard control character in a string,
; escape it with the backslash character:
;
;   "I am a \"string\" with a line-feed at the end.\n"
;
; Defined escape sequences are:
;bfnrt
;   \" - quote character
;   \\ - backslash character
;   \0 - null (zero) character
;   \b - backspace character (unicode code point 08)
;   \f - form feed (unicode code point 0C)
;   \n - line feed / new line (unicode code point 0A)
;   \r - carriage return (unicode code point 0D)
;   \t - tab character (unicode code point 09)
; 345678901234567890123456789012345678901234567890123456789012345678901234567890
	; SonOfDysfunction.sod
	;
	; "SON of Dysfunction" is a serializable object notation (SON) built
	; after the experimental "Dysfunction" transfer syntax. "SON of Dysfunction"
	; (aka SOD) defines a transfer syntax, a type system, a set of XPath-ish macros
	; to identify a nodes in transferred data structures and several recommended
	; utility macros.
	;
	; The primary objective of SOD mostly to have a flexible object notation which
	; worked like JSON but allowed comments. Like JSON, SOD targets systems with
	; dynamic object systems, but can still be used with static systems with a
	; little work. JSON is reverse compatible with SOD in that valid JSON strings
	; are also valid SOD strings, but SOD makes a few enhancements:
	;
	; 1. Comments are totally doable (in fact, you're reading a comment right
	; now.)
	; 2. SOD defines 6 primitive types (null, boolean, integer, float, utf string
	; and binary sequences) and 3 scalar types (vectors, arrays and maps.)
	; 3. Non-String keys are a thing (also, string keys don't _have_ to be in
	; quotes.)
	; 4. The transfer syntax is distinct from the type system.
	; 5. We add keywords to reset the state of the parser and to identify the
	; type system the sender assumes the reader will use.
	;
	; A Quick Gabfest on Transfer Syntax vs. Type System
	;
	; When we say the transfer syntax is distinct from the type system, we mean the
	; rules the parser uses to identify various bits of a SOD encoded string are
	; distinct from the assumptions it uses about things like how many bits are in
	; an integer or floating point number. So if you're parsing the string
	; "4294967297" (i.e. 2^32 + 1) it's easy to see it obeys the rules of being a
	; number, but if your underlying type system limits integers to 32 bit values,
	; you've got a problem.
	;
	; It's easy to say "aw heck, just make all integers 64 bits," but this requires
	; ALL systems that support your transfer system to grok 64 bit values. And wha
	; about all the cryptography people who want to store integers as arbitrary
	; length strings of octets. Krikey! To those people, it might be completely
	; legit to have a number like "340282366920938463463374607431768211457".
	;
	; SOD avoids this problem by separating the parsing rules (what we call the
	; transfer syntax) from the underlying type system. Because having sensible
	; expectations of how a message receiver will deserialize values in messages
	; is a "good thing(tm)" we define three type systems: small, medium and large.
	;
	; There's more on the type systems later, but for right now it's probably
	; enough to know there are type systems targeting 8-bit microcontrollers,
	; 32-bit mobile and desktop systems and big iron servers out in the cloud. If
	; you've seen a SOD message, you might have seen an exclamation point at the
	; beginning of the message followed by a number. This tells the receiver which
	; type system it expects it to support:
	;
	; !0 - no type system (default)
	; !1 - small type system
	; !2 - medium type system
	; !3 - large type system
	;
	; Oh, and one more thing you're going to see throughout this file, a bare
	; exclamation point just means "reset the parser." If you guessed that
	; changing the type system also resets the parser, you guessed right.
	;
	; Back to the Discussion About Parsing
	;
	; So the SOD Transfer Syntax is intended to tell the parser how to identify the
	; beginning and end of different bits of serialized data. Like many other
	; serialization formats, SOD messages consist of a series of tokens separated
	; by whitespace, ocassionally with comments in-between tokens. We use the
	; Unicode white space list to identify whitespace in SOD:
	;
	; 09 Control-I \t HT Horizontal Tab
	; 0A Control-J \n LF Line Feed
	; 0B Control-K \v VT Vertical Tab (does anyone actually use this?)
	; 0C Control-L \f FF Form Feed
	; 0D Control-M \r CR Carriage Return
	; 20 ASCII Space Character
	; 0085 NEXT-LINE (NEL)
	; 00A0 NO-BREAK SPACE
	; 1680 OGHAM SPACE MARK
	; 2000 EN QUAD
	; 2001 EM QUAD
	; 2002 EN SPACE
	; 2003 EM SPACE
	; 2004 THREE-PER-EM SPACE
	; 2005 FOUR-PER-EM SPACE
	; 2006 SIX-PER-EM SPACE
	; 2007 FIGURE SPACE
	; 2008 PUNCTUATION SPACE
	; 2009 THIN SPACE
	; 200A HAIR SPACE
	; 2028 LINE SEPARATOR
	; 2029 PARAGRAPH SEPARATOR
	; 202F NARROW NO-BREAK SPACE
	; 205F MEDIUM MATHEMATICAL SPACE
	; 3000 IDEOGRAPHIC SPACE
	;
	; SOD supports "to the end of line" comments starting with the characters:
	; semi-colon (;), hash mark (#) and slash-slash digraph (//). Enclosed comments
	; are opened with the slash-star digraph (/*) and closed with the star-slash
	; digraph (*/).
	;
	; "End of Line" is identified by these characters:
	; 0A Control-J \n LF Line Feed
	; 0B Control-K \v VT Vertical Tab (does anyone actually use this?)
	; 0C Control-L \f FF Form Feed
	; 0D Control-M \r CR Carriage Return
	; CR + LF
	; 0085 NEXT-LINE (NEL)
	; 2028 LINE SEPARATOR
	; 2029 PARAGRAPH SEPARATOR
	;
	; About the Abstract Type System
	;
	; So you know how we said there were three type systems? We probably should
	; have said there are three CONCRETE type systems. There's also an "abstract"
	; type system that defines what types are available and what they're supposed
	; to represent. The concrete type systems define how types in the abstract type
	; system are stored in memory.
	;
	; The key thing we get from the abstract type system is we get types humans
	; can reason about: numbers, booleans, strings, etc. The Abstract Type System
	; defines 1 null type, 5 scalar types and 3 constructed types:
	;
	; Null Types:
	; Null - CS equivalent of "this page intentionally left blank"
	; Scalar Types:
	; Boolean - True / False
	; Integer - a counting number
	; Floating Point - aka "REAL" numbers
	; Unicode String - unicode encoded string
	; Octet Sequence - sequence of octets / "binary data"
	; Constructed Types:
	; Vector - an array of scalar values, each with the same type
	; Array - an array of values of any type (including arrays)
	; Map - an array of values indexed by null or scalar values
	;
	; Null and Boolean literal values are limited to the case sensitive terminals:
	;
	; "nil" - a null value
	; "false" - a false boolean
	; "true" - a true boolean
	;
	; Integers are strings of digits with an optional dash at the beginning. Ex:
	;
	; Integers: 12, -14, 000018, -340282366920938463463374607431768211456
	;
	; Floating point values are like integers, but they have a decimal point. Ex:
	;
	; Floats: -3.1415928, 1.1, 0.15625, .7, -.3
	;
	; Floating point values can also be expressed in scientific notation. Ex:
	;
	; Moar Floats: 6.022E23, 9.0125E4, -7.297E-3
	;
	; Strings begin with a double-quote character and end with a non-escaped
	; double quote:
	;
	; Strings: "I am a string", "the next string is a null string", ""
	;
	; If you need to put a double quote or standard control character in a string,
	; escape it with the backslash character:
	;
	; "I am a \"string\" with a line-feed at the end.\n"
	;
	; Defined escape sequences are:
	;bfnrt
	; \" - quote character
	; \\ - backslash character
	; \0 - null (zero) character
	; \b - backspace character (unicode code point 08)
	; \f - form feed (unicode code point 0C)
	; \n - line feed / new line (unicode code point 0A)
	; \r - carriage return (unicode code point 0D)
	; \t - tab character (unicode code point 09)
	; 345678901234567890123456789012345678901234567890123456789012345678901234567890