Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
(original: http://groups.yahoo.com/group/sml-dev/message/4729)
(archived: https://web.archive.org/web/20010603012942/http://groups.yahoo.com/group/sml-dev/message/4710)
From: "Clark C . Evans" <cce@c...>
Date: Fri May 11, 2001 8:50 pm
Subject: YAML Draft 0.1
With quite a bit of work, I've tried to come up with
a first pass at this proposal. It's at www.yaml.org,
below is the current text version. Your comments
would be very cool.
Thanks!
Clark
+---------------------------------------------------------------+
| Welcome to YAML, Draft 0.1 |
+---------------------------------------------------------------+
| YAML is a straight-forward markup language, offering an |
| alternative to XML, borrowing ideas from C, HTML, Perl, and |
| Python. |
| |
| * YAML texts are brief and readable. |
| * YAML is very expressive and extensible. |
| * YAML has a simple stream based interface. |
| * YAML uses data structures native to your programming |
| language. |
| * YAML is easy to implement, perhaps too easy. |
| * YAML has a solid information model, no exceptions no |
| mess. |
| |
+---------------------------------------------------------------+
| Key Concepts |
+---------------------------------------------------------------+
| YAML is founded on several key concepts from very successful |
| languages. |
| |
| * YAML uses similar whitespace handling as HTML. In YAML, |
| sequences of spaces, tabs, and carriage return characters |
| are folded into a single space during parse. This |
| wonderful technique makes markup code readable by |
| enabling indentation without affecting the canonical form |
| of the content. |
| * YAML uses similar slash style escape sequences as C. In |
| YAML, the backslash, \ , is used as an escape indicator. |
| Like C, \n is used to represent a new line, \t is used to |
| represent a tab, and \\ is used to represent the slash. |
| In addition, since whitespace is folded, YAML introduces |
| \s to represent additional spaces that is part of the |
| content and should not be folded. Further, the \ |
| character as a continuation marker, allowing content to |
| be broken into multiple lines without introducing |
| unwanted whitespace. |
| * YAML uses similar data typing as Perl. In YAML, there |
| there are three fundamental types of data, scalars which |
| are indicated by a dollar ($) sign, maps/hashes which are |
| indicated by a (%) sign, and list/vectors which are |
| indicated by a (@) sign. Also like perl, all node names |
| (variables) begin with one of these indicators. As a |
| result, YAML's internal memory based representation uses |
| your language's native map, list, and string constructs |
| rather than inventing it's own object model. |
| * YAML uses block scoping similar to Python. In YAML, the |
| extent of a node is indicated by its child's nesting |
| level, i.e., what column it is in. Skeptable as you may |
| be, ask anyone who has worked with Python, and you will |
| hear that it makes the code more readable and less error |
| prone. Try it. It makes life easy. |
| |
+---------------------------------------------------------------+
| Example |
+---------------------------------------------------------------+
| To the left is an example of an invoice expressed via YAML. |
| $invoice 00034843 |
| $date 12-JAN-2001 |
| %buyer |
| $given-name Chris |
| $family-name Dumars |
| %address |
| $line1 458 Wittigen's Way |
| $line2 Suite #292 |
| $city Royal Oak |
| $state MI |
| $postal 48046 |
| @order |
| %product |
| $id BL394D |
| $desc Grade A, Leather Hide Basketball |
| $price $450.00 |
| $quantity 4 |
| %product |
| $id BL4438H |
| $desc Super Hoop (tm) |
| $price $2,392.00 |
| $quantity 1 |
| $comments |
| Mr. Dumars is frequently gone in the morning |
| so it is best advised to try things in late |
| afternoon. \nIf Joe isn't around, try his house\ |
| keeper, Nancy Billsmer @ (734) 338-4338. |
| %delivery |
| $method UZS Express Overnight |
| $price $45.50 |
| $tax 0% |
| $total $4237.50 |
+---------------------------------------------------------------+
| Information Model |
+---------------------------------------------------------------+
| The information model is similar to XML, although it has many |
| significant differences. |
| Document The the starting production for YAML is a List. |
| List An ordered sequence of zero or more Nodes |
| Node An ordered tuple having an optional Name followed |
| by a mandatory Value |
| Name Identical to the Name production in the XML 1.0 |
| specification. |
| Value Exactly one of String, Map, or List |
| String A sequence of zero or more characters. A character |
| is identical to the character defined in the Char |
| production of the XML 1.0 specification. |
| Map An un-ordered sequence of zero or more Nodes such |
| that each Node's Name is unique within the |
| sequence. There may be only a single node without a |
| name in each map. |
+---------------------------------------------------------------+
| Common XML Compatibility |
+---------------------------------------------------------------+
| Although the syntax is distinctly different, a restricted |
| subset of YAML can be used to provide an isomorphic image of |
| an Common XML text. This involves a few conventions layered |
| upon YAML. Following are the simple mapping conventions. |
| <x/> %x An XML element can be |
| @ represented in YAML |
| using a map node with |
| an anonymous list |
| child. |
| <x>text</x> %x An XML text node can |
| @ be represented in |
| $ text YAML using an |
| anonymous string node |
| in the context of an |
| anonymous list node. |
| <x att="value"/> %x An XML attribute node |
| $att value is represented in |
| @ YAML using a named |
| string node in the |
| context of a map |
| node. |
| <x><y/></x> %x An XML parent/child |
| @ element relationship |
| %y can be represented in |
| @ YAML by placing the |
| element's |
| representation within |
| the anonymous list |
| node. |
| <x a="val">text<y/></x> %x Of course, these all |
| $a val play together. |
| @ |
| $ text |
| %y |
| @ |
| This mapping has an abbreviated form, which is the default |
| conversion, although the more verbose form lends itself |
| better to generic processing. Thus, a XML to YAML-X converter |
| should offer both forms. |
| <x/> $x If there are no |
| children and no |
| attributes, an XML |
| element can be |
| written as a named |
| string node with zero |
| characters. |
| <x>text</x> $x text If an XML element has |
| only a single text |
| node child with no |
| attributes, then it |
| can be represented |
| using a named string |
| node. |
| <x att="value"/> %x If an XML element |
| $att value with attributes lacks |
| children, the |
| anonymous list node |
| may be omitted. |
| <x><y/></x> @x An XML element with |
| $y children and no |
| attributes may be |
| represented as a YAML |
| list node. |
| When converting textnodes and attribute values from XML, |
| significant whitespace must be escaped using \r for carriage |
| return, \n for new line, \t for a tab, \s for an additional |
| space, and \\ for a backslash. By default, the conversion |
| should wrap content as described in the serilization section |
| below. Below are examples of how specific text nodes could be |
| converted. |
| <x> $x \ntext In this case, a new |
| text</x> line had to be |
| escaped. |
| <x>long line</x> $x String content is |
| long converted here using |
| line multiple indented |
| lines. No escaping |
| here due to YAML's |
| whitespace folding. |
| The YAML version |
| contains one |
| significant space |
| between "long" and |
| "line". |
| <x>nospace</x> $x Here multiple lines |
| no\ are also used, |
| space however a trailing |
| escape \ indicates |
| that the line break |
| does not induce a |
| significant space. |
| <x>a \ esc</x> $x a \\ esc Of course, \ in |
| content must be |
| escaped |
| <x> $x A bit more |
| text with 2 sp \n text with complicated. |
| </x> 2\s sp\n |
+---------------------------------------------------------------+
| Encoding |
+---------------------------------------------------------------+
| A YAML text may be use UTF16 or ISO 8859-1 character |
| encodings. YTML explicitly allows MIME headers to specify |
| alternative encodings and provide document level meta-data, |
| including an YAML version number. |
| |
| A YAML Parser should check for a UTF16 byte order mark. If it |
| is found, then the YAML text is encoded using UTF16, |
| surrogate paris excepted. Otherwise, the parser should assume |
| 8 bit ISO LATIN, 8859-1. The default is not UTF8 since UTF8 |
| is not a simple single-byte encoding. Thus, a parser must |
| support both UTF16 and ISO 8859-1 and is not required to |
| support any other encodings. |
| |
| The parser should identify the first line of the text |
| starting with an indicator, ($@%). All lines leading up to |
| this point are collectively called the header, this line and |
| all following lines are collectively called the body. The |
| header should be examined for MIME header fields. |
| |
| If MIME header fields are present, the parser should verify |
| that a transfer encoding other than 7bit, 8bit, or binary is |
| not used. Specifically, if base64 or quoted-printable is |
| used, the parser must exit gracefully as YAML forbids |
| transfer encodings. Also, if the Content-Type is multipart, |
| the parser must exit as support for multi-part content is |
| forbidden with version 1.0 of YAML. |
| |
| Further, the parser should examine the Content-Type, and |
| should exit gracefully if the charset is not supported by the |
| parser. Thus, other encodings may be supported by a given |
| parser, but parsers are only required to support UTF16 |
| (excepting surrogate pairs) and ISO 8859-1. Finally, the |
| parser must check for a X-YAML-Version and should assume |
| version 1.0 if the MIME header is missing or this specific |
| header field is absent. Parser may make these MIME header |
| fields available through its API, but this is not a |
| requirement. |
| |
| If content before the first indicator exists, but does not |
| "look" like a MIME header, then the parser may issue a |
| warning message. Specifically, any line in the header having |
| whitespace followed by an indicator ($@%) is an error and |
| must be reported. Finally, if a header exists, then the line |
| immediately before the body must be a blank line as specified |
| by the MIME specification. |
+---------------------------------------------------------------+
| Serilization Format / BNF |
+---------------------------------------------------------------+
| This section contains the BNF productions for the YAML |
| syntax. Much to do... |
+---------------------------------------------------------------+
| Parser Behavior |
+---------------------------------------------------------------+
| This section describes how a parser should parse YAML. Much |
| to do... |
+---------------------------------------------------------------+
| Emitter Behavior / Canonical Form |
+---------------------------------------------------------------+
| This section describes how an emitter should write YAML into |
| canonical form. Includes specific word-wrapping algorithem. |
| Minimal content length of 20 chararacters, and does it's best |
| to word-wrap by 76 columns. |
+---------------------------------------------------------------+
| Implementations |
+---------------------------------------------------------------+
| To do... an implementation in C, C++/STL, Python, Java, and |
| ... |
+---------------------------------------------------------------+
| Credits |
+---------------------------------------------------------------+
| This work is the result of long, thoughtful discussions on |
| the SML-DEV mailing list. Specific contributors include... |
| (to do) |
+---------------------------------------------------------------+
| Some thoughts |
+---------------------------------------------------------------+
| 1. This is very preliminary thoughts on the subject, feedback |
| is very welcome. |
| 2. Implementations needed... Clark is happy to write the |
| Python, C, and perhaps even a C++ implementation. Any |
| takers? |
| 3. Was thinking hard about using # for a comment indicator, |
| or perhaps as a numeric indicator. Benfits? In any case, |
| the BNF should leave all of these special characters open |
| to future versions. |
| |
+---------------------------------------------------------------+
| FAQ |
+---------------------------------------------------------------+
| 1. Don't the indicator characters need to be escaped in the |
| content? Answer: No. |
| |
+---------------------------------------------------------------+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment