Skip to content

Instantly share code, notes, and snippets.

@espadrine
Last active December 14, 2015 04:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save espadrine/5028426 to your computer and use it in GitHub Desktop.
Save espadrine/5028426 to your computer and use it in GitHub Desktop.
Things I like / don't like about TOML

4 Things I Find Great About TOML

  • Arrays allow trailing commas!
  • Comments! (From a JSON perspective, that's huge!)
  • Dates are UTC!
  • Simple syntax, no semicolon / commas, no need to check for matching braces!

4 Things I Don't Like About TOML

inGlobal = true
[keygroup]
inKeyGroup = true
# Here, you can't add a key to the global hash!
  • You can't have mixed data in arrays unless they're both one level deep!

    So, it feels like arrays are typed, except that empty arrays are not.

key = [ [1], ["foo"], [] ]
  • You can't have hashes inside arrays!
# That really should be an array.
[dependancy1]
groupId    = "com.google.api-client"
artifactId = "google-api-client"
version    = "1.13.2-beta"

[dependancy2]
groupId    = "com.google.api-client"
artifactId = "google-api-client-servlet"
version    = "1.13.1-beta"
  • The specification is lacking!
    • A document in UTF-16 / Windows 1252 must change encoding to UTF-8 in each string!
    • What happens when the keygroup is the empty string?
    • Lots of (informally-specified) things you can't encode: keys with = in them, keygroups with ] in them… (The current spec allows = in keys by not forbidding it.) Which special characters are allowed in keys anyway?

Alternative: the Settings File Format.

@BurntSushi
Copy link

Adding keys back to the global hash seems kind of strange to me. Why do you want it? And can you think of any convenient and simple syntax for doing so? One of the nice things about TOML is that all keys for any particular group are located in the same place. In fact, you're guaranteed this behavior because duplicate keys are not allowed.

Mixed arrays are bad. It's tough going for static languages, and it militates against well formed structured data. As of now, there is a proposal to add tuples and make arrays complete homogeneous. This will allow well-typed structured data while still allowing arrays to contain mixed data.

Also, empty arrays are typed! I'm not sure why you think they aren't. In the current spec, all arrays have type array. If my proposal for homogeneous arrays and tuples goes through, then empty arrays will be polymorphic (i.e., its component type is always equal to any other type).

There is also a proposal for anonymous hashes, but the syntax is tough to get right and it isn't clear that it fits inside a simple configuration language.

A document in UTF-16 / Windows 1252 must change encoding to UTF-8 in each string!

This is a Good Thing.

The rest of your dislikes are just things that need to be clarified in the spec. I'm sure it will come with time---it's only been a week.

@espadrine
Copy link
Author

It isn't quite true that all keys for a particular group are located in the same place.
For instance:

[foo.bar]
baz = "bar"

[quux]
nox = "fox"

[foo.baz]
bar = "baz"

The idea that static languages would have a hard time with heterogeneous arrays ignores the fact that static languages usually include the type of the content in the type of the array, effectively making things like [[1], ["1"]] impossible to map to built-in arrays, unless I'm about to learn something awesome.

In that sense, empty arrays are indeed untyped: their generic type parameter cannot be inferred. We don't know the type of its content.

Indeed, yay for tuples! Although I cannot see myself using homogeneous arrays knowing that tuples are there, which makes the homogeneous argument harder to maintain. Choice is nice nonetheless.

As for having mixed encodings, what do you expect text editors to show?

I made a lengthier rant here, about the unfortunate history of configuration formats.

Anyway, if my rants can be of help, I'm glad!

@BurntSushi
Copy link

It isn't quite true that all keys for a particular group are located in the same place.

I meant the same level of the hash, as we were talking about the global hash.

The idea that static languages would have a hard time with heterogeneous arrays ignores the fact that static languages usually include the type of the content in the type of the array, effectively making things like [[1], ["1"]] impossible to map to built-in arrays, unless I'm about to learn something awesome.

What? That's precisely the reason why static languages would have a hard time with heterogeneous arrays. Hence the proposal I linked to in my first response.

In that sense, empty arrays are indeed untyped: their generic type parameter cannot be inferred. We don't know the type of its content.

I stated this before. In the current spec, arrays are typed as array. This includes empty arrays.

If my proposal is accepted, then arrays have type array of [toml-type] and empty arrays have a polymorphic type. This does not make them "untyped". Every value has a type. Moreover, the only time the type of an empty array remains polymorphic is when it is a top-level value: some-key = []. When it is part of another array that has at least one value of concrete type, the empty array adopts that type.

Indeed, yay for tuples! Although I cannot see myself using homogeneous arrays knowing that tuples are there, which makes the homogeneous argument harder to maintain. Choice is nice nonetheless.

Tuples and arrays are completely different. Clearly you don't work with staticly typed languages. If you only live in the dynamic world, then the distinction between arrays and tuples is not meaningful to you. But configuration file formats should not only cater dynamicly typed languages.

As for having mixed encodings, what do you expect text editors to show?

TOML doesn't have multiple encodings. It is only UTF-8.

Also...

Lots of (informally-specified) things you can't encode: keys with = in them

Yes. = are not allowed in keys. The spec says this.

keygroups with ] in them

Not allowed.

Which special characters are allowed in keys anyway?

Anything but =, \t or .

Part of the problem with the spec is that it hasn't been formalized—nor does it provide an EBNF. It's still in informal English. Give it some time to breathe before criticizing it for being imprecise.

@espadrine
Copy link
Author

What? That's precisely the reason why static languages would have a hard time with heterogeneous arrays. Hence the proposal I linked to in my first response.

Right. I really hope it gets merged!

Tuples and arrays are completely different. Clearly you don't work with staticly typed languages.

Currently working in LAMP on Scala stuff gives me a free pass away from this accusation! ☺

That said, as long as the aforementioned patch gets merged, the design of arrays allows [ [1], ["1"] ], which can't be optimized that well for statically typed languages.
It will also be hard for current implementations to update that requirement, when the patch is applied.

TOML doesn't have multiple encodings. It is only UTF-8.

The current spec
has the example of working around a file encoded in Latin-1.
I believe files encoded in non-UTF-8 should simply be rejected as soon as a character clearly isn't UTF-8; that's how I specified dotset.

Part of the problem with the spec is that it hasn't been formalized—nor does it provide an EBNF. It's still in informal English. Give it some time to breathe before criticizing it for being imprecise.

I wished it was at least as precise as the format I published after him…
and I hope that patch gets merged.

@BurntSushi
Copy link

Currently working in LAMP on Scala stuff gives me a free pass away from this accusation! ☺

Ah, nice. I've heard good thing about Scala, but I keep my distance from the JVM if I can help it...

That said, as long as the aforementioned patch gets merged, the design of arrays allows [ [1], ["1"] ], which can't be optimized that well for statically typed languages.

Eh? I think the wording is off here. That array is allowed now. But if my proposal is accepted, then it won't be allowed. (The problem is the use of the word "homogeneous." Arrays in TOML are homogeneous now and they will be if my proposal accepted. The wording is poor. The real change is how arrays are typed. Lots of people are getting confused by this.)

It will also be hard for current implementations to update that requirement, when the patch is applied.

I've already done it for my parser. It took maybe an hour or two. And it was fun. :-)

The current spec has the example of working around a file encoded in Latin-1. I believe files encoded in non-UTF-8 should simply be rejected as soon as a character clearly isn't UTF-8; that's how I specified dotset.

Dang. That is unfortunate. I hope it is changed to UTF-8 only.

I wished it was at least as precise as the format I published after him…

I'm not sure how smart mojombo is, but I suspect he left the spec imprecise for a reason. He wasn't trying to impose his idea of what the spec should be, but rather, lay fertile ground for plenty of other sources of input while providing some general principles for future development of the spec. At least, that's my opinion based purely on speculation (and it's what I would've done if I had his celebrity status).

Your format may be nicer, but alas, we plebians cannot release something and have the instant benefit of some level of adoption. And with a configuration file format, adoption is a big ingredient.

@espadrine
Copy link
Author

Eh? I think the wording is off here. That array is allowed now. But if my proposal is accepted, then it won't be allowed.

Yes. What I meant is "Until the patch gets merged…"

Your format may be nicer, but alas, we plebians cannot release something and have the instant benefit of some level of adoption. And with a configuration file format, adoption is a big ingredient.

That's the trick, isn't it?
My format is a subset of YAML, so there already are parsers everywhere!
(Of course, having dotset-specific parsers is nicer and safer!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment