Skip to content

Instantly share code, notes, and snippets.

@majensen
Last active June 28, 2018 02:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save majensen/cfef84d08a14b08f4b07c4ef99f32f88 to your computer and use it in GitHub Desktop.
Save majensen/cfef84d08a14b08f4b07c4ef99f32f88 to your computer and use it in GitHub Desktop.
@perlpunk (TINITA) Final Report - Complete YAML::PP

(submitted by @perlpunk to @majensen)

Please make public comments here on the TPF blog entry. Thanks.

YAML::PP Grant Report

Original Grant Proposal: Complete YAML::PP

Up to the March 2018 report, I worked on YAML related things about 305 hours. Adding the time since then, (if I exclude the time spent in Oslo in April), I think it's safe to say I spent at least 340 hours in total, that's a bit more than two month full time work.

In its current state, you can rely on the documented YAML::PP functionality, but not so much on its API. Be aware that the usage might change.

Pin the used YAML::PP version, and run your test suite when upgrading. (You have a test suite, right?)

I will do my best to document breaking changes in the Changelog.

Deliverables

Here is a comparison of my original plans to what I've done.

Complete YAML::PP::Parser

Original

  • Flow Style
  • Flow Nodes as mapping keys
  • Line and Column Numbers for error messages

Status

Most of flow style is done.

The remaining cases are rarely used and can all be avoided, or edge cases:

  • Empty nodes where a comma or ] is directly following a tag or anchor [&anchor,], {foo:,}
  • Implicit mappings in flow sequences: [a: 1, b: 2] == [{a: 1},{b: 2}]
  • Explicit keys in flow collections: [ ? key : value ]
  • "Empty" documents (two document end markers ...)
  • Unquoted strings ending with :, e.g. foo::: bar equals "foo::": bar
  • No space after colon when key is quoted {"foo":23}

Flow nodes as mapping keys are not really relevant for perl, since they can't be loaded into native hashes.

[a, b]: [1, 2]

Most error messages have line and column number, and the lexer will report which tokens were expected, and which it got instead.

Currently there are 21 failing parsing tests according to the YAML Test Matrix. To see what to avoid you can view every failing test case.

Most of the parsing is done via a grammar, but there is still also manual parsing going on that should be transferred to the grammar in the future.

YAML::PP::Loader/Constructor

Original

  • Implement loading of Tags and blessing into objects
  • Provide a possibility for safe loading
  • Ideally provide a way to only load certain tags

Status

Currently you can load scalars into objects or transform data, by providing a regex or list of strings, and/or a tag name. You can provide a code reference which gets the original YAML scalar and its style as an argument. There's an example in the distribution that has a little templating feature and can load external vars.

Safe loading is the default. I added an option to detect or reject cyclic references.

You cannot yet do that custom loading for mappings or sequences. Because YAML supports cyclic references, this can get quite complicated, though.

Originally I planned to implement just one standard Schema plus the generic perl objects. It would have been easier to hardcode this, but I decided to make it more generic. It should end up as powerful as PyYAML, for example. I regularly see questions asked that are using PyYAML's features, so I think Perl should also have something like this. After all, it's Perl!

Instead I implemented all three YAML 1.2 Schemas in a generic way. You can load and Dump data structures using Failsafe, JSON and Core Schema. The only other YAML processor I know that can load different schemas is js-yaml, but it currently supports only Failsafe and Core.

Adding the YAML 1.1 types to YAML::PP to be able to load 1.1 documents should be easy.

For loading generic perl objects, I have to add the custom loading for mappings and sequences first.

Emitter/Dumper

Original

  • Write YAML::PP::Emitter
  • Write YAML::PP::Dumper/Deconstructor

Status

The Dumper is able to dump all data structures except objects or things like typeglobs or coderefs.

The Emitter is able to output all test cases (that the parser can parse) correctly, except for folded block scalars. Since folded block scalars aren't used by default, you should be able to use it correctly for all data.

I wrote YAML::PP::Representer which is the opposite of YAML::PP::Constructor. It is responsible for the Schema, that means deciding if something is an integer, float, boolean undef or string and telling the emitter if it has to be quoted or not.

The test suite and related projects are slowly attracting more developers. I think it's a success, but there's still a lot of work.

I added 64 tests and fixed existing tests, especially the tests for JSON comparison, since I was the only one using these tests until recently.

Additional work

I did a lot of work for YAML.pm and YAML::XS, too. Although this was not part of the original grant proposal, I think it improved the state of YAML in Perl a lot, so I included this work in my reports.

I added $YAML::XS::LoadBlessed, so you can now (quite) safely load YAML from untrusted sources.

You can now serialize booleans with the $YAML::XS::Boolean option, enabling you to exchange data with JSON modules and others that use booleans.

I fixed a bug with loading many regexes in one YAML file, which resulted in a segfault.

Loading and dumping one regex multiple times will now not grow the regex anymore.

In the test matrix, YAML::XS is now very close to PyYAML.

YAML.pm is still widely used and quite incompatible with other YAML processors. One reason is that it was written for YAML 1.0. But there were also bugs and problems which I was able to fix.

I added $YAML::LoadBlessed, so all YAML modules on CPAN are now safe regarding loading objects.

Other changes:

  • Fixed a problematic regex for parsing quoted strings
  • Added support for trailing comments. So far I know of one CPAN module that broke because of that, but fortunately it was only the test suite that needed a patch. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=898561 Thanks to the debian crew! If you are using MARC::Transform be sure to add quotes around strings with <space>#
  • Fixed two problems with mapping keys that equal = or start with =<space>
  • Fixed the same bug as in YAML::XS with growing regexes
  • Fixed bug when loading top level scalars with multiple spaces
  • Support compact nested sequences
  • Support zero indented sequences

Especially the last two increase interoperability with modules like YAML::XS.

The YAML editor consists of a docker image with 17 different YAML processors built in, several programs/scripts that parse/load/emit/dump YAML, and some clever vimscript to play with several frameworks at once.

I did several small fixes for the existing views, added new ones and improved documentation a bit.

Recently, Herbert Riedel, a Haskell developer, joined us on IRC, and he is writing a YAML Parser/Loader on top of the YAML Reference Parser, written by Oren Ben-Kiki. Ingy added it to the YAML Editor. This is really helpful because we can now easily see how the reference parser parses the test cases. You can see it in the test matrix now; it's currently the parser and loader which passes most test cases.

I think the test matrix is an important part of the test infrastructure because it visualizes the test suite and also gives a quick overview over existing YAML processors.

I added an overview page to quickly compare all the processors to each other: https://matrix.yaml.io/

I added the results for the invalid tests to the overview.

I want to add a page to it that shows all test cases highlighted, so people can get a very quick impression of what the test suite contains, instead of having to browse all test cases manually. It will look similar to YAML::PP highlighted results, so I will do the highlighting in perl.

Although recently Ingy changed the test suite to the new TestML format which is now processed by nodejs, a lot of the test infrastructure is powered by perl.

Blog posts

I wrote five blog posts, and I think the tutorials already have been very helpful, at least to me. When someone asks about YAML on IRC or stackoverflow, I can often just give a small example and then refer to one of the articles.

Talks

I gave a 40 minute talk on The state of the YAML at TPC in Amsterdam.

Feedback showed that it was a bit too much theory for most of the audience.

At the London Perl Workshop in November I gave a more practical 20 minute talk about YAML - Where and how to use? What's new? (Video), and a Lightning Talk YAML::PP - Just another YAML Framework? (Video). I got positive feedback for those.

Thanks to people giving feedback which helps me to improve my talks.

All past reports

Thanks to...

  • Ingy for inventing YAML
  • Felix Krause for helping me understanding the Spec
  • My Grant Manager Mark for helping me with my reports
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment