This specification describes a canonical way to losslessly encode XML in KDL. While this isn't a very useful thing to want to do on its own, it's occasionally useful when using a KDL toolchain while speaking with an XML-consuming or -emitting service.
This is version 1.0.0 of XiK.
XML-in-KDL (XiK from now on) is a kdl microsyntax for losslessly encoding XML into a KDL document. XML and KDL, luckily, have very similar data models (KDL is almost a superset of XML), so it's quite straightforward to encode most XML documents into KDL.
XML has four types of nodes, corresponding to certain KDL constructs:
- Elements, which have an element name, zero or more attribute, and zero or more children. These are encoded directly as KDL nodes, using the nodename, properties, and children nodes.
- Raw text. In "pure" XML dialects, where raw text only appears as the sole child of an element (never mixed with other elements as siblings), this is generally encoded as a final string argument in a KDL node; in "mixed" XML dialects, it can be encoded as a special KDL node with an empty (
""
) node name. - Processing Instructions and Comments. These are encoded as raw XML syntax embedded as KDL node names.
XML elements and KDL nodes have a direct correspondence. In XiK, an XML element is encoded in KDL by:
- making the element name the KDL node name
- making the attributes into KDL properties
- making the child nodes as KDL child nodes
For example, the XML <element foo="bar"><child baz="qux" /></element>
is encoded into XiK as element foo="bar" { child baz="qux" }
.
Raw text contents of an element can be encoded in two possible ways.
If the element contains only text, it should be encoded as a final string unnamed argument. For example, the XML <a href="http://example.com">here's a link</a>
can be encoded as a href="http://example.com" "here's a link"
.
If the element contains mixed text and element children, the text can be encoded as a KDL node with the name -
with a single string unnamed argument. For example, the XML <span>some <b>bold</b> text</span>
can be encoded as span { - "some "; b "bold"; - " text" }
.
An element that contains only text is allowed to encode it as -
children. For example, <span>foo</span>
may be encoded as span { - "foo" }
instead of span "foo"
. However, an element cannot mix the "final string attribute" with child nodes; span "foo" { b "bar" }
is an invalid encoding of <span>foo<b>bar</b></span>
. (It must be encoded as span { - "foo"; b "bar" }
.)
XML namespaces are encoded the same as XML: the node name simply contains a :
character. Note that KDL identifier syntax allows :
directly in an ident, so a name like xml:space
or xlink:href
is a valid node or property name.
XML processing instructions and comments are encoded into KDL by putting their entire XML syntax into a node name. For example, <!-- comment! -->
is encoded as the node "<!-- comment! -->"
, with no attributes, properties, or children.