Skip to content

Instantly share code, notes, and snippets.

@OhMeadhbh
Created December 22, 2014 11:34
Show Gist options
  • Save OhMeadhbh/eea2f388e0a56822cf71 to your computer and use it in GitHub Desktop.
Save OhMeadhbh/eea2f388e0a56822cf71 to your computer and use it in GitHub Desktop.
DDN: Son of Son of Dysfunction
; ddn.ddn
;
; This document describes the Dynamic Data Notation (DDN). DDN is a superset
; of JavaScript Object Notation and fills a similar role. It differs from JSON
; in a few important ways:
;
; * DDN supports comments. Semi-colons (;), hash markes (#) and C++ style
; "slash slash" di-graphs (//) all begin a "to the end of the line" style
; comment. C-style "slash splat" (/*) and "splat slash" (*/) digraphs
; enclose bounded comments.
;
; * Dates are first-class types. That means you can specify dates directly
; with RFC3339 format.
;
; * UTF-8. It's UTF-8 turtles all the way down.
;
; * Primitive types as keys. Any primitive type (null, boolean, integer,
; float or date) can be a key in an associative array.
;
; * Concatenation of vectors. If you put two strings or two arrays next to
; each other with only white space between them, they're concatenated
; together.
{
null : "Note that the null item can be the key for an associative array.",
false : "True and false can also be keys for associative arrays.",
true : "And remember; characters that normally denote comments "
"are interpreted as members of strings inside quotes /* and double "
"quotes.*/" /* This is a comment, however. */,
0 : "Did you notice we placed three strings right next to each other?",
1 : "When the parser sees multiple vectors next to each other, it "
"simply concatenates them.",
2 : [ 0, 1, 2 ] [3, 4, 5], ; This works with Arrays and Objects as well.
"foo" : "Using numbers or strings as indexes into \"normal\" or "
"associative arrays works pretty much the same as with JSON.",
3.0 : "Associative arrays can have keys with any primitive type.",
3.1 : "But watch out, keys are converted into canonical string form when "
"used as an index. This means the string '3' and the number 3 "
"reference the same element, but the integer 3 and the floating "
"point value 3.0 reference different elements, because their "
"canonical string values are different.",
3.2 : "We don't retain precision when creating canonical string "
"representations, so '3.0' and '3.00' both refer to the same value.",
3.20 : "By default, parsers MUST not raise an exception if two keys are "
"identical, however.",
3.20 : "When deserialized, the last key:value association encountered by "
"the parser will be the one in the deserialized object.",
3.20 : "This is to reduce the impact on running systems in loosely-coupled "
"systems. Implementations MAY implement an advisory service that "
"emits an event when a duplicate key is found."
; In JavaScript, this would look something like this:
; var parser = new DDN();
; parser.on( 'duplicate', function( e ) { console.log( 'dup!' ); } );
; parser.parse( '{ "a":"once", "a":"upon", "a":"a", "a":"time" }' );
2012-12-21 : "Did we mention dates?",
2012-12-21T00:00:00Z : "Or times?",
2012-12 : "Or partial dates?",
2012-12-21T00:00 : "Or partial times?",
2012-12-21T00:00:00.000Z : "Times can have fractions of seconds.",
4294967297 : "Hold on. Did I just specify an integer greater than 32 bits?",
18446744073709551617 : "Why yes, I believe I did.",
79228162514264337593543950336 : "Uh oh. This is getting freaky.",
4 : "Okay. Why does that work? Shouldn't that throw an error?",
4.1 : "You would think it would. But it doesn't, because DDN describes "
"a transfer syntax, not a type system.",
4.2 : "This means DDN defines a parser that knows how to tell where "
"numbers and strings and dates and key:value pairs begin and end, "
"but it doesn't mean it requires ints be 32 bits wide. (more on "
"this later.)",
5.0 : <<EOS
One of the coolest things Bash does is support Here Docs. This means that when
the parser sees the sequence << (less than, less than) followed by a symbol,
followed by a new line, It interprets everything until newline - symbol as a
string.
So we're actually defining a string right here. And as long as we don't have
newline followed by EOS, it's going to keep on shoveling characters into the
string.
We can even add bits of DDN syntax. It doesn't matter. { we're in a string }
EOS
"",
5.1 : "But once we hit that symbol-newline sequence, we're back into "
"regular parsing mode.",
5.2 : <<"And you can have spaces in your symbols"
Bonus points if you figured out what the quotes around the symbol do.
And you can have spaces in your symbols,
5.3 : <<2012-12-21T00:00:00.000Z
Okay. this looks like the Here-doc initiator is a date object. It's not. We're
not that crazy. It's just a string.
2012-12-21T00:00:00.000Z,
5.4 : <<2012-12-21T00:00:00.000Z
You can re-use here-doc initiator strings. <<BUT_THEY_DON'T_NEST
2012-12-21T00:00:00.000Z,
5.5 : <<EOS,
You can put a comma in your here-doc terminator, but that looks very, very
confusing, IMHO. In the line below one comma is part of the here-doc symbol and
the next comma is there because i need a comma between this array element and
the next.
EOS,,
6.0 : "So... what about arrays? Yes. We have them.",
6.1 : [ 'this', 'is', 'a', 'typical', 'array', 0 ],
6.2 : "Each element in an array can be any type.",
7.0 : "But we have 'packed arrays' to represent arrays with all the same "
"type of thing.",
7.1 : [[ 'this', 'is', 'a', 'packed', 'array', 'with', 'just', 'strings']],
7.2 : [[ 0xFF, 0xFE, 0xFD ]], ; this is a packed array of 8 bit chars.
7.3 : [[ 16 | 1, 2, 3, 4 ]], ; this is a packed array of 16-bit values.
7.4 : "the number in between the double square brace and the bar "
"can be any numeric value that's a multiple of 8. (defaulting "
"to 8.),
8.0 : [[ | TGludXggaGVsaXVtIDMuMi4wLTQtNjg2LXBhZSAjMSBTTVAgRGViaWFuIDMuMi42
My0yK2RlYjd1MSBpNjg2IEdOVS9MaW51eAo= ]],
8.1 : "if there's no value between the double-square-brace and the bar, "
"we assume it's base64."
}
.small
[
"so this is weird. we just terminated the associative array and are ",
"starting a new 'regular' array. What's up with that? ",
"",
"Well. It turns out, the parser will return an array of objects if it sees ",
"more than one."
]
.tiny
32
.large
<<EOS
So if you parsed this, you would get five objects in an array: an associative
array, a regular array, an integer, a string and another associative array.
You're probably also wondering what all those .small, .tiny, .large symbols
are. Remember I said DDN doesn't specify the size of integers? That's only
partially true. The parser doesn't REQUIRE values to fit in a specific size,
but it can communicate to the receiver it's intent to follow certain type
sizes.
By default, the parser is in "indeterminate mode." This means there are no
limits to the sizes of things it transmits. If you include the symbol '.tiny'
in the parse stream, you are signaling your intent to only send 'tiny' data.
Tiny integers are 8 bits wide. Tiny floats are 16 bits (half precision.)
Tiny strings are no more than 255 characters long. small and large modes have
these limits:
integer float
------- -------
small 32 bit 64 bit (double precision)
large 64 bit 128 bit (quad precision)
EOS
.indeterminate
{
0 : "So there's one more thing to talk about. And it's going to annoy a ",
1 : "lot of people cause it makes parsing a little harder.",
2 : "If you put a bar character between two associative arrays, it ",
3 : "merges them."
}
|{
4 : "So... the contents of this associative array are merged into the ",
5 : "previous associative array.",
6 : null
}
|{
6 : "It's okay to have the same keys in the two arrays, the keys defined ",
7 : "later replace the keys defined earlier. So the 6:null key:value pair ",
8 : "in the array above would be replaced by the 6:string pair from this ",
9 : "array."
}
|{
10 : "It's a bit of a pain to code this on 8 bit microcontrollers, so it's ",
11 : "disabled in .tiny mode.",
}
|{
12 : "The main reason for this feature is to enable 'poor mans journaling.'",
13 : "While an 8-bit micro might not want to parse it, it's not too bad ",
14 : "for a typical 32 or 64 bit system. So your 32 bit system would speak ",
15 : "tiny to the 8 bit system. But the 9 bit system would speak .small to ",
16 : "the 32/64 bit server. And since it's speaking to a system that ",
17 : "doesn't have a problem parsing it, the microcontroller has the ",
18 : "option of having the remote system merge the associative arrays."
}
@OhMeadhbh
Copy link
Author

Hey Kent. thanks for the comments. let me start with a list of problems i'm trying to solve (besides the obvious one of communicating a serialized dynamic data structure.)

  1. Be a super-set of JSON. For historical reasons, we want valid JSON blobs to be valid DDN blobs, and to be interpreted in the same way. (though obviously
  2. Enhance readablity with comments. JSON is used in a number of places as a configuration format. it wasn't REALLY intended for that, but it's a consequence of ubiquity. adding comments gives the ability to help contextualize elements in the serialized data structure.

Having both "to the end of comments" and bounded comments and their delimiters are really just a matter of personal taste. I originally used parentheses for bounded delimeters, but most people found that a little distracting (except FORTH programmers.)

  1. Enhance human comprehension with auto-catenation of strings & here-docs. Here-docs are somewhat useful when using a program to construct some data: Just emit the here-doc start, emit the data, then emit the here-doc end. But honestly, the reason I added it is I am often putting HTML in JSON. Yes. It is bad form. But there are times when HTML rendered text like "

    warning: You're about to do something dangerous.

    " is the data you're trying to move. This example is easy enough, but when you decide to include a longer string (like a description of why something is dangerous,) you wind up with strings that span lines in your text editor and are a little less easy to understand.

So... for example... consider this JSON:

{
"success": false,
"description": "

The capability your system provided has expired.

For more information on web capabilities, please see <a href="http://example.com/docs/webcaps.html\">A Brief Introduction to Web Capabilities

"
}

And compare it to:

{
"success": false,
"description":
"

"
"The capability your system provided has expired."
"

"
"

"
"For more information on web capabilities, please see "
"<a href="http://example.com/docs/webcaps.html\">A Brief Introduction to Web Capabilities"
"

"
}

or even:

{
"success": false,
"description": <<EOS

The capability your system provided has expired.

For more information on web capabilities, please see A Brief Introduction to Web Capabilities

EOS
}

  1. Move canonicalization of type literals to the transfer syntax. That's actually the real reason for the "any type as a key." In DDN, we don't use the object itself as the key, but it's canonical string representation. I should probably have also pointed out that true (the boolean true literal) is considered the same key as "true" (the string with the characters t-r-u-e) since both canonicalize to the same value. (it's the same as the 3 & "3" example above.)
  2. Define a syntax, not a type system. One of the things that drove a small amount of discord in LLSD was the exact definition of the abstract type system. The more I thought about it, the more I thought a transfer syntax should not dictate type behaviour. That being said, you have to have agreement on either side of the pipe with respect to max values otherwise you get 8 bit controllers having to store 64 bit floats just to remain compatible with the transfer syntax.

The directives (.tiny, .small, .large, .indeterminate) are used to signal the senders intent not to send values that violate certain type constraints. DDN doesn't REQUIRE endpoints to adhere to this promise,
but it does allow system builders to signal the consumer of the serialized form of their intent. This is in keeping with the "provide mechanism, not policy," concept.

  1. Support "poor man's journaling." I actually use this to store config information. Starting with a default value, followed by updates. For example, here's my ipaddr config for one of my machines:

{
"en0": {
"auto": false,
"type": "dhcp"
}
}

|{
"en0": {
"auto": true
},
"wlan0": {
"auto": true,
"type": "dhcp"
}
}

And then taking points one by one...

  1. i've implemented this parser on a z80 based micro. it's not NEARLY as complicated as XML, though slightly more complicated as JSON. Implementations of the previous version of this spec exist in javascript and z80 assembly.
  2. Should probably have been clearer about the "any type as a key." It turns out that Self and Smalltalk let you use any object as a key in a dictionary, but as mentioned above, we're actually converting types to canonicalized strings before using them as keys. The real reason we're doing it is actually political. Sheesh. a group with four programmers and we still have politics.
  3. We actually use RFC3339, which is just a touch more restrictive than ISO8601. Sure enough, I didn't add that reference to the original doc. Date strings are canonicalized to UTC, so the following two timestamps actually represent the same key:

"2014-12-17T14:00:05Z"
"2014-12-17T13:00:05-01:00"

Also. should probably point out that support for leap seconds is currently a MAY and not a MUST. We expect the underlying system to property interpret leap seconds. Interestingly, one of the places we used the "any type as a key" was in an array of leap seconds for which a particular action had taken place:

{
2012-06-30T23:59:60Z: true,
1997-06-30T23:59:60Z: false
}

  1. Too many comment types? I don't think so. I really wanted to use parentheses for bounded comments as a shout-out to my days as a FORTH programmer, but what shred of sanity i have left bade me abandon this idea. The thing I really hate is supporting comment digraphs.
  2. No no. you can do strings delimited with single quotes. I didn't explicitly mention it, but 6.1 is an array of strings delimited w/ single quotes.
  3. The purpose of the here-doc is to make it easier to include text w/ line-feeds, the same as python triple-quotes. It actually doesn't complicate the parser that much. if you see [ 0x3E, 0x3E, ..., 0x0A ] in the input and you're not already in a string, that means begin of heredoc. a newline + identifier ends the heredoc. the newlines are part of the delimiter, so if you want a leading newline in the string, do this:

<<FOO

There's a leading newline right before this line.
FOO

For a trailing newline, do this:

<<FOO
There's a trailing newline right after this line.

FOO

And this has neither:

<<FOO
Neither a trailer nor a leader be.
FOO

Which is, IMHO, a little easier to comprehend than python's

"""Oh hey, this line has a trailing newline."""

"""This line does not have a trailing newline"""

or is it

"""This line does not have a trailing newline"""

I can never remember if python triple-quote strings require you to escape double quotes or not. But given enough time, python's

  1. I'll admit. the auto-catenation of strings is REALLY there because i'm lazy and frequently write code that doesn't want to maintain state about whether it's in the middle, beginning or end of a constructed string. It just so happens it improves readability a touch, but the real reason is programmer laziness.

Turns out I don't really personally need auto catenation of arrays. and now that i think about it, the auto-catenation rules make it so you can't have two vector types in sequence at the top level. Maybe requiring the auto-catenation character for strings as well as arrays? hmm... i have to evangelize that change since it will require changing deployed code, but i think an argument could be made for explicitly identifying where concatenation occurs.

  1. It actually simplifies the parser on the embedded side in exchange for a mild increase in complication on the server side. What I didn't mention here is that it's outside the scope of this doc what embedded devices do when they encounter non-tiny inputs. But our systems reset the connection, causing the server to not receive a response and raise an error condition.
  2. Right. signed values. Our code currently interprets a tiny -1 to be the same as 0xFF and 255. So the rule is "it has to be representable in 8 bits."

@OhMeadhbh
Copy link
Author

was also just thinking... DDN as described here is a transfer syntax, maybe we should rename it to be something like "FOO transfer syntax" and have a different document describing the processing expectations so there's a clear dividing line between transfer syntax and type expectations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment