Last active
August 29, 2015 14:06
-
-
Save OhMeadhbh/89574f94b0afdec02298 to your computer and use it in GitHub Desktop.
About SON of Dysfunction
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
; SonOfDysfunction.sod | |
; | |
; "SON of Dysfunction" is a serializable object notation (SON) built | |
; after the experimental "Dysfunction" transfer syntax. "SON of Dysfunction" | |
; (aka SOD) defines a transfer syntax, a type system, a set of XPath-ish macros | |
; to identify a nodes in transferred data structures and several recommended | |
; utility macros. | |
; | |
; The primary objective of SOD mostly to have a flexible object notation which | |
; worked like JSON but allowed comments. Like JSON, SOD targets systems with | |
; dynamic object systems, but can still be used with static systems with a | |
; little work. JSON is reverse compatible with SOD in that valid JSON strings | |
; are also valid SOD strings, but SOD makes a few enhancements: | |
; | |
; 1. Comments are totally doable (in fact, you're reading a comment right | |
; now.) | |
; 2. SOD defines 6 primitive types (null, boolean, integer, float, utf string | |
; and binary sequences) and 3 scalar types (vectors, arrays and maps.) | |
; 3. Non-String keys are a thing (also, string keys don't _have_ to be in | |
; quotes.) | |
; 4. The transfer syntax is distinct from the type system. | |
; 5. We add keywords to reset the state of the parser and to identify the | |
; type system the sender assumes the reader will use. | |
; | |
; *A Quick Gabfest on Transfer Syntax vs. Type System* | |
; | |
; When we say the transfer syntax is distinct from the type system, we mean the | |
; rules the parser uses to identify various bits of a SOD encoded string are | |
; distinct from the assumptions it uses about things like how many bits are in | |
; an integer or floating point number. So if you're parsing the string | |
; "4294967297" (i.e. 2^32 + 1) it's easy to see it obeys the rules of being a | |
; number, but if your underlying type system limits integers to 32 bit values, | |
; you've got a problem. | |
; | |
; It's easy to say "aw heck, just make all integers 64 bits," but this requires | |
; ALL systems that support your transfer system to grok 64 bit values. And wha | |
; about all the cryptography people who want to store integers as arbitrary | |
; length strings of octets. Krikey! To those people, it might be completely | |
; legit to have a number like "340282366920938463463374607431768211457". | |
; | |
; SOD avoids this problem by separating the parsing rules (what we call the | |
; transfer syntax) from the underlying type system. Because having sensible | |
; expectations of how a message receiver will deserialize values in messages | |
; is a "good thing(tm)" we define three type systems: small, medium and large. | |
; | |
; There's more on the type systems later, but for right now it's probably | |
; enough to know there are type systems targeting 8-bit microcontrollers, | |
; 32-bit mobile and desktop systems and big iron servers out in the cloud. If | |
; you've seen a SOD message, you might have seen an exclamation point at the | |
; beginning of the message followed by a number. This tells the receiver which | |
; type system it expects it to support: | |
; | |
; !0 - no type system (default) | |
; !1 - small type system | |
; !2 - medium type system | |
; !3 - large type system | |
; | |
; Oh, and one more thing you're going to see throughout this file, a bare | |
; exclamation point just means "reset the parser." If you guessed that | |
; changing the type system also resets the parser, you guessed right. | |
; | |
; *Back to the Discussion About Parsing* | |
; | |
; So the SOD Transfer Syntax is intended to tell the parser how to identify the | |
; beginning and end of different bits of serialized data. Like many other | |
; serialization formats, SOD messages consist of a series of tokens separated | |
; by whitespace, ocassionally with comments in-between tokens. We use the | |
; Unicode white space list to identify whitespace in SOD: | |
; | |
; 09 Control-I \t HT Horizontal Tab | |
; 0A Control-J \n LF Line Feed | |
; 0B Control-K \v VT Vertical Tab (does anyone actually use this?) | |
; 0C Control-L \f FF Form Feed | |
; 0D Control-M \r CR Carriage Return | |
; 20 ASCII Space Character | |
; 0085 NEXT-LINE (NEL) | |
; 00A0 NO-BREAK SPACE | |
; 1680 OGHAM SPACE MARK | |
; 2000 EN QUAD | |
; 2001 EM QUAD | |
; 2002 EN SPACE | |
; 2003 EM SPACE | |
; 2004 THREE-PER-EM SPACE | |
; 2005 FOUR-PER-EM SPACE | |
; 2006 SIX-PER-EM SPACE | |
; 2007 FIGURE SPACE | |
; 2008 PUNCTUATION SPACE | |
; 2009 THIN SPACE | |
; 200A HAIR SPACE | |
; 2028 LINE SEPARATOR | |
; 2029 PARAGRAPH SEPARATOR | |
; 202F NARROW NO-BREAK SPACE | |
; 205F MEDIUM MATHEMATICAL SPACE | |
; 3000 IDEOGRAPHIC SPACE | |
; | |
; SOD supports "to the end of line" comments starting with the characters: | |
; semi-colon (;), hash mark (#) and slash-slash digraph (//). Enclosed comments | |
; are opened with the slash-star digraph (/*) and closed with the star-slash | |
; digraph (*/). | |
; | |
; "End of Line" is identified by these characters: | |
; 0A Control-J \n LF Line Feed | |
; 0B Control-K \v VT Vertical Tab (does anyone actually use this?) | |
; 0C Control-L \f FF Form Feed | |
; 0D Control-M \r CR Carriage Return | |
; CR + LF | |
; 0085 NEXT-LINE (NEL) | |
; 2028 LINE SEPARATOR | |
; 2029 PARAGRAPH SEPARATOR | |
; | |
; *About the Abstract Type System* | |
; | |
; So you know how we said there were three type systems? We probably should | |
; have said there are three CONCRETE type systems. There's also an "abstract" | |
; type system that defines what types are available and what they're supposed | |
; to represent. The concrete type systems define how types in the abstract type | |
; system are stored in memory. | |
; | |
; The key thing we get from the abstract type system is we get types humans | |
; can reason about: numbers, booleans, strings, etc. The Abstract Type System | |
; defines 1 null type, 5 scalar types and 3 constructed types: | |
; | |
; Null Types: | |
; Null - CS equivalent of "this page intentionally left blank" | |
; Scalar Types: | |
; Boolean - True / False | |
; Integer - a counting number | |
; Floating Point - aka "REAL" numbers | |
; Unicode String - unicode encoded string | |
; Octet Sequence - sequence of octets / "binary data" | |
; Constructed Types: | |
; Vector - an array of scalar values, each with the same type | |
; Array - an array of values of any type (including arrays) | |
; Map - an array of values indexed by null or scalar values | |
; | |
; Null and Boolean literal values are limited to the case sensitive terminals: | |
; | |
; "nil" - a null value | |
; "false" - a false boolean | |
; "true" - a true boolean | |
; | |
; Integers are strings of digits with an optional dash at the beginning. Ex: | |
; | |
; Integers: 12, -14, 000018, -340282366920938463463374607431768211456 | |
; | |
; Floating point values are like integers, but they have a decimal point. Ex: | |
; | |
; Floats: -3.1415928, 1.1, 0.15625, .7, -.3 | |
; | |
; Floating point values can also be expressed in scientific notation. Ex: | |
; | |
; Moar Floats: 6.022E23, 9.0125E4, -7.297E-3 | |
; | |
; Strings begin with a double-quote character and end with a non-escaped | |
; double quote: | |
; | |
; Strings: "I am a string", "the next string is a null string", "" | |
; | |
; If you need to put a double quote or standard control character in a string, | |
; escape it with the backslash character: | |
; | |
; "I am a \"string\" with a line-feed at the end.\n" | |
; | |
; Defined escape sequences are: | |
;bfnrt | |
; \" - quote character | |
; \\ - backslash character | |
; \0 - null (zero) character | |
; \b - backspace character (unicode code point 08) | |
; \f - form feed (unicode code point 0C) | |
; \n - line feed / new line (unicode code point 0A) | |
; \r - carriage return (unicode code point 0D) | |
; \t - tab character (unicode code point 09) | |
; 345678901234567890123456789012345678901234567890123456789012345678901234567890 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment