Skip to content

Instantly share code, notes, and snippets.

@tabatkins
Created July 18, 2021 21:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tabatkins/97228cab18d6b7727aa9842261adb75a to your computer and use it in GitHub Desktop.
Save tabatkins/97228cab18d6b7727aa9842261adb75a to your computer and use it in GitHub Desktop.
XML in KDL

XML-in-KDL (XiK)

This specification describes a canonical way to losslessly encode XML in KDL. While this isn't a very useful thing to want to do on its own, it's occasionally useful when using a KDL toolchain while speaking with an XML-consuming or -emitting service.

This is version 1.0.0 of XiK.

XML-in-KDL (XiK from now on) is a kdl microsyntax for losslessly encoding XML into a KDL document. XML and KDL, luckily, have very similar data models (KDL is almost a superset of XML), so it's quite straightforward to encode most XML documents into KDL.

XML has four types of nodes, corresponding to certain KDL constructs:

  • Elements, which have an element name, zero or more attribute, and zero or more children. These are encoded directly as KDL nodes, using the nodename, properties, and children nodes.
  • Raw text. In "pure" XML dialects, where raw text only appears as the sole child of an element (never mixed with other elements as siblings), this is generally encoded as a final string argument in a KDL node; in "mixed" XML dialects, it can be encoded as a special KDL node with an empty ("") node name.
  • Processing Instructions and Comments. These are encoded as raw XML syntax embedded as KDL node names.

XML elements and KDL nodes have a direct correspondence. In XiK, an XML element is encoded in KDL by:

  • making the element name the KDL node name
  • making the attributes into KDL properties
  • making the child nodes as KDL child nodes

For example, the XML <element foo="bar"><child baz="qux" /></element> is encoded into XiK as element foo="bar" { child baz="qux" }.

Raw text contents of an element can be encoded in two possible ways.

If the element contains only text, it should be encoded as a final string unnamed argument. For example, the XML <a href="http://example.com">here's a link</a> can be encoded as a href="http://example.com" "here's a link".

If the element contains mixed text and element children, the text can be encoded as a KDL node with the name - with a single string unnamed argument. For example, the XML <span>some <b>bold</b> text</span> can be encoded as span { - "some "; b "bold"; - " text" }.

An element that contains only text is allowed to encode it as - children. For example, <span>foo</span> may be encoded as span { - "foo" } instead of span "foo". However, an element cannot mix the "final string attribute" with child nodes; span "foo" { b "bar" } is an invalid encoding of <span>foo<b>bar</b></span>. (It must be encoded as span { - "foo"; b "bar" }.)

XML namespaces are encoded the same as XML: the node name simply contains a : character. Note that KDL identifier syntax allows : directly in an ident, so a name like xml:space or xlink:href is a valid node or property name.


XML processing instructions and comments are encoded into KDL by putting their entire XML syntax into a node name. For example, <!-- comment! --> is encoded as the node "<!-- comment! -->", with no attributes, properties, or children.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment