tabatkins/xml-in-kdl.md

## xml-in-kdl.md

      
    Raw
  

              xml-in-kdl.md
            
          
    XML-in-KDL (XiK)

This specification describes a canonical way to losslessly encode XML in KDL. While this isn't a very useful thing to want to do on its own, it's occasionally useful when using a KDL toolchain while speaking with an XML-consuming or -emitting service.
This is version 1.0.0 of XiK.
XML-in-KDL (XiK from now on) is a kdl microsyntax for losslessly encoding XML into a KDL document. XML and KDL, luckily, have very similar data models (KDL is almost a superset of XML), so it's quite straightforward to encode most XML documents into KDL.
XML has four types of nodes, corresponding to certain KDL constructs:

Elements, which have an element name, zero or more attribute, and zero or more children. These are encoded directly as KDL nodes, using the nodename, properties, and children nodes.
Raw text. In "pure" XML dialects, where raw text only appears as the sole child of an element (never mixed with other elements as siblings), this is generally encoded as a final string argument in a KDL node; in "mixed" XML dialects, it can be encoded as a special KDL node with an empty ("") node name.
Processing Instructions and Comments. These are encoded as raw XML syntax embedded as KDL node names.


XML elements and KDL nodes have a direct correspondence. In XiK, an XML element is encoded in KDL by:

making the element name the KDL node name
making the attributes into KDL properties
making the child nodes as KDL child nodes

For example, the XML <element foo="bar"><child baz="qux" /></element> is encoded into XiK as element foo="bar" { child baz="qux" }.
Raw text contents of an element can be encoded in two possible ways.
If the element contains only text, it should be encoded as a final string unnamed argument. For example, the XML <a href="http://example.com">here's a link</a> can be encoded as a href="http://example.com" "here's a link".
If the element contains mixed text and element children, the text can be encoded as a KDL node with the name - with a single string unnamed argument. For example, the XML <span>some <b>bold</b> text</span> can be encoded as span { - "some "; b "bold"; - " text" }.
An element that contains only text is allowed to encode it as - children. For example, <span>foo</span> may be encoded as span { - "foo" } instead of span "foo". However, an element cannot mix the "final string attribute" with child nodes; span "foo" { b "bar" } is an invalid encoding of <span>foo<b>bar</b></span>. (It must be encoded as span { - "foo"; b "bar" }.)
XML namespaces are encoded the same as XML: the node name simply contains a : character. Note that KDL identifier syntax allows : directly in an ident, so a name like xml:space or xlink:href is a valid node or property name.

XML processing instructions and comments are encoded into KDL by putting their entire XML syntax into a node name. For example, <!-- comment! --> is encoded as the node "<!-- comment! -->", with no attributes, properties, or children.