Last active
September 28, 2020 12:37
-
-
Save takahashim/9e59f316774b1adb840c03e8bf4cf8d3 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(original: http://groups.yahoo.com/group/sml-dev/message/4729) | |
(archived: https://web.archive.org/web/20010603012942/http://groups.yahoo.com/group/sml-dev/message/4710) | |
From: "Clark C . Evans" <cce@c...> | |
Date: Fri May 11, 2001 8:50 pm | |
Subject: YAML Draft 0.1 | |
With quite a bit of work, I've tried to come up with | |
a first pass at this proposal. It's at www.yaml.org, | |
below is the current text version. Your comments | |
would be very cool. | |
Thanks! | |
Clark | |
+---------------------------------------------------------------+ | |
| Welcome to YAML, Draft 0.1 | | |
+---------------------------------------------------------------+ | |
| YAML is a straight-forward markup language, offering an | | |
| alternative to XML, borrowing ideas from C, HTML, Perl, and | | |
| Python. | | |
| | | |
| * YAML texts are brief and readable. | | |
| * YAML is very expressive and extensible. | | |
| * YAML has a simple stream based interface. | | |
| * YAML uses data structures native to your programming | | |
| language. | | |
| * YAML is easy to implement, perhaps too easy. | | |
| * YAML has a solid information model, no exceptions no | | |
| mess. | | |
| | | |
+---------------------------------------------------------------+ | |
| Key Concepts | | |
+---------------------------------------------------------------+ | |
| YAML is founded on several key concepts from very successful | | |
| languages. | | |
| | | |
| * YAML uses similar whitespace handling as HTML. In YAML, | | |
| sequences of spaces, tabs, and carriage return characters | | |
| are folded into a single space during parse. This | | |
| wonderful technique makes markup code readable by | | |
| enabling indentation without affecting the canonical form | | |
| of the content. | | |
| * YAML uses similar slash style escape sequences as C. In | | |
| YAML, the backslash, \ , is used as an escape indicator. | | |
| Like C, \n is used to represent a new line, \t is used to | | |
| represent a tab, and \\ is used to represent the slash. | | |
| In addition, since whitespace is folded, YAML introduces | | |
| \s to represent additional spaces that is part of the | | |
| content and should not be folded. Further, the \ | | |
| character as a continuation marker, allowing content to | | |
| be broken into multiple lines without introducing | | |
| unwanted whitespace. | | |
| * YAML uses similar data typing as Perl. In YAML, there | | |
| there are three fundamental types of data, scalars which | | |
| are indicated by a dollar ($) sign, maps/hashes which are | | |
| indicated by a (%) sign, and list/vectors which are | | |
| indicated by a (@) sign. Also like perl, all node names | | |
| (variables) begin with one of these indicators. As a | | |
| result, YAML's internal memory based representation uses | | |
| your language's native map, list, and string constructs | | |
| rather than inventing it's own object model. | | |
| * YAML uses block scoping similar to Python. In YAML, the | | |
| extent of a node is indicated by its child's nesting | | |
| level, i.e., what column it is in. Skeptable as you may | | |
| be, ask anyone who has worked with Python, and you will | | |
| hear that it makes the code more readable and less error | | |
| prone. Try it. It makes life easy. | | |
| | | |
+---------------------------------------------------------------+ | |
| Example | | |
+---------------------------------------------------------------+ | |
| To the left is an example of an invoice expressed via YAML. | | |
| $invoice 00034843 | | |
| $date 12-JAN-2001 | | |
| %buyer | | |
| $given-name Chris | | |
| $family-name Dumars | | |
| %address | | |
| $line1 458 Wittigen's Way | | |
| $line2 Suite #292 | | |
| $city Royal Oak | | |
| $state MI | | |
| $postal 48046 | | |
| @order | | |
| %product | | |
| $id BL394D | | |
| $desc Grade A, Leather Hide Basketball | | |
| $price $450.00 | | |
| $quantity 4 | | |
| %product | | |
| $id BL4438H | | |
| $desc Super Hoop (tm) | | |
| $price $2,392.00 | | |
| $quantity 1 | | |
| $comments | | |
| Mr. Dumars is frequently gone in the morning | | |
| so it is best advised to try things in late | | |
| afternoon. \nIf Joe isn't around, try his house\ | | |
| keeper, Nancy Billsmer @ (734) 338-4338. | | |
| %delivery | | |
| $method UZS Express Overnight | | |
| $price $45.50 | | |
| $tax 0% | | |
| $total $4237.50 | | |
+---------------------------------------------------------------+ | |
| Information Model | | |
+---------------------------------------------------------------+ | |
| The information model is similar to XML, although it has many | | |
| significant differences. | | |
| Document The the starting production for YAML is a List. | | |
| List An ordered sequence of zero or more Nodes | | |
| Node An ordered tuple having an optional Name followed | | |
| by a mandatory Value | | |
| Name Identical to the Name production in the XML 1.0 | | |
| specification. | | |
| Value Exactly one of String, Map, or List | | |
| String A sequence of zero or more characters. A character | | |
| is identical to the character defined in the Char | | |
| production of the XML 1.0 specification. | | |
| Map An un-ordered sequence of zero or more Nodes such | | |
| that each Node's Name is unique within the | | |
| sequence. There may be only a single node without a | | |
| name in each map. | | |
+---------------------------------------------------------------+ | |
| Common XML Compatibility | | |
+---------------------------------------------------------------+ | |
| Although the syntax is distinctly different, a restricted | | |
| subset of YAML can be used to provide an isomorphic image of | | |
| an Common XML text. This involves a few conventions layered | | |
| upon YAML. Following are the simple mapping conventions. | | |
| <x/> %x An XML element can be | | |
| @ represented in YAML | | |
| using a map node with | | |
| an anonymous list | | |
| child. | | |
| <x>text</x> %x An XML text node can | | |
| @ be represented in | | |
| $ text YAML using an | | |
| anonymous string node | | |
| in the context of an | | |
| anonymous list node. | | |
| <x att="value"/> %x An XML attribute node | | |
| $att value is represented in | | |
| @ YAML using a named | | |
| string node in the | | |
| context of a map | | |
| node. | | |
| <x><y/></x> %x An XML parent/child | | |
| @ element relationship | | |
| %y can be represented in | | |
| @ YAML by placing the | | |
| element's | | |
| representation within | | |
| the anonymous list | | |
| node. | | |
| <x a="val">text<y/></x> %x Of course, these all | | |
| $a val play together. | | |
| @ | | |
| $ text | | |
| %y | | |
| @ | | |
| This mapping has an abbreviated form, which is the default | | |
| conversion, although the more verbose form lends itself | | |
| better to generic processing. Thus, a XML to YAML-X converter | | |
| should offer both forms. | | |
| <x/> $x If there are no | | |
| children and no | | |
| attributes, an XML | | |
| element can be | | |
| written as a named | | |
| string node with zero | | |
| characters. | | |
| <x>text</x> $x text If an XML element has | | |
| only a single text | | |
| node child with no | | |
| attributes, then it | | |
| can be represented | | |
| using a named string | | |
| node. | | |
| <x att="value"/> %x If an XML element | | |
| $att value with attributes lacks | | |
| children, the | | |
| anonymous list node | | |
| may be omitted. | | |
| <x><y/></x> @x An XML element with | | |
| $y children and no | | |
| attributes may be | | |
| represented as a YAML | | |
| list node. | | |
| When converting textnodes and attribute values from XML, | | |
| significant whitespace must be escaped using \r for carriage | | |
| return, \n for new line, \t for a tab, \s for an additional | | |
| space, and \\ for a backslash. By default, the conversion | | |
| should wrap content as described in the serilization section | | |
| below. Below are examples of how specific text nodes could be | | |
| converted. | | |
| <x> $x \ntext In this case, a new | | |
| text</x> line had to be | | |
| escaped. | | |
| <x>long line</x> $x String content is | | |
| long converted here using | | |
| line multiple indented | | |
| lines. No escaping | | |
| here due to YAML's | | |
| whitespace folding. | | |
| The YAML version | | |
| contains one | | |
| significant space | | |
| between "long" and | | |
| "line". | | |
| <x>nospace</x> $x Here multiple lines | | |
| no\ are also used, | | |
| space however a trailing | | |
| escape \ indicates | | |
| that the line break | | |
| does not induce a | | |
| significant space. | | |
| <x>a \ esc</x> $x a \\ esc Of course, \ in | | |
| content must be | | |
| escaped | | |
| <x> $x A bit more | | |
| text with 2 sp \n text with complicated. | | |
| </x> 2\s sp\n | | |
+---------------------------------------------------------------+ | |
| Encoding | | |
+---------------------------------------------------------------+ | |
| A YAML text may be use UTF16 or ISO 8859-1 character | | |
| encodings. YTML explicitly allows MIME headers to specify | | |
| alternative encodings and provide document level meta-data, | | |
| including an YAML version number. | | |
| | | |
| A YAML Parser should check for a UTF16 byte order mark. If it | | |
| is found, then the YAML text is encoded using UTF16, | | |
| surrogate paris excepted. Otherwise, the parser should assume | | |
| 8 bit ISO LATIN, 8859-1. The default is not UTF8 since UTF8 | | |
| is not a simple single-byte encoding. Thus, a parser must | | |
| support both UTF16 and ISO 8859-1 and is not required to | | |
| support any other encodings. | | |
| | | |
| The parser should identify the first line of the text | | |
| starting with an indicator, ($@%). All lines leading up to | | |
| this point are collectively called the header, this line and | | |
| all following lines are collectively called the body. The | | |
| header should be examined for MIME header fields. | | |
| | | |
| If MIME header fields are present, the parser should verify | | |
| that a transfer encoding other than 7bit, 8bit, or binary is | | |
| not used. Specifically, if base64 or quoted-printable is | | |
| used, the parser must exit gracefully as YAML forbids | | |
| transfer encodings. Also, if the Content-Type is multipart, | | |
| the parser must exit as support for multi-part content is | | |
| forbidden with version 1.0 of YAML. | | |
| | | |
| Further, the parser should examine the Content-Type, and | | |
| should exit gracefully if the charset is not supported by the | | |
| parser. Thus, other encodings may be supported by a given | | |
| parser, but parsers are only required to support UTF16 | | |
| (excepting surrogate pairs) and ISO 8859-1. Finally, the | | |
| parser must check for a X-YAML-Version and should assume | | |
| version 1.0 if the MIME header is missing or this specific | | |
| header field is absent. Parser may make these MIME header | | |
| fields available through its API, but this is not a | | |
| requirement. | | |
| | | |
| If content before the first indicator exists, but does not | | |
| "look" like a MIME header, then the parser may issue a | | |
| warning message. Specifically, any line in the header having | | |
| whitespace followed by an indicator ($@%) is an error and | | |
| must be reported. Finally, if a header exists, then the line | | |
| immediately before the body must be a blank line as specified | | |
| by the MIME specification. | | |
+---------------------------------------------------------------+ | |
| Serilization Format / BNF | | |
+---------------------------------------------------------------+ | |
| This section contains the BNF productions for the YAML | | |
| syntax. Much to do... | | |
+---------------------------------------------------------------+ | |
| Parser Behavior | | |
+---------------------------------------------------------------+ | |
| This section describes how a parser should parse YAML. Much | | |
| to do... | | |
+---------------------------------------------------------------+ | |
| Emitter Behavior / Canonical Form | | |
+---------------------------------------------------------------+ | |
| This section describes how an emitter should write YAML into | | |
| canonical form. Includes specific word-wrapping algorithem. | | |
| Minimal content length of 20 chararacters, and does it's best | | |
| to word-wrap by 76 columns. | | |
+---------------------------------------------------------------+ | |
| Implementations | | |
+---------------------------------------------------------------+ | |
| To do... an implementation in C, C++/STL, Python, Java, and | | |
| ... | | |
+---------------------------------------------------------------+ | |
| Credits | | |
+---------------------------------------------------------------+ | |
| This work is the result of long, thoughtful discussions on | | |
| the SML-DEV mailing list. Specific contributors include... | | |
| (to do) | | |
+---------------------------------------------------------------+ | |
| Some thoughts | | |
+---------------------------------------------------------------+ | |
| 1. This is very preliminary thoughts on the subject, feedback | | |
| is very welcome. | | |
| 2. Implementations needed... Clark is happy to write the | | |
| Python, C, and perhaps even a C++ implementation. Any | | |
| takers? | | |
| 3. Was thinking hard about using # for a comment indicator, | | |
| or perhaps as a numeric indicator. Benfits? In any case, | | |
| the BNF should leave all of these special characters open | | |
| to future versions. | | |
| | | |
+---------------------------------------------------------------+ | |
| FAQ | | |
+---------------------------------------------------------------+ | |
| 1. Don't the indicator characters need to be escaped in the | | |
| content? Answer: No. | | |
| | | |
+---------------------------------------------------------------+ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment