Skip to content

Instantly share code, notes, and snippets.

@yarcowang
Last active July 21, 2016 21:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save yarcowang/e689994d8f3c6fe72a1e51c17f52b613 to your computer and use it in GitHub Desktop.
Save yarcowang/e689994d8f3c6fe72a1e51c17f52b613 to your computer and use it in GitHub Desktop.
TOML -> TOM: from TOML to TOM

TOM

This is a modification version of original TOML by Tom Preston-Werner cause i thought:

  • it should be more simple
  • it can be not called as a language (as a developer, i'm lost in a lot of programming languages, tired anyway...)

Objectives

TOM aims to be a minimal configuration file format that's easy to read due to obvious semantics. TOM is designed to map unambiguously to a hash table. TOM should be easy to parse into data structures in a wide variety of languages.

Table of contents

Example

; this is a comment

title = "TOM Example" # this is an inline comment

[owner]
name = "Yarco Wang"
dob = 1980-05-17T07:32:00+08:00 # First class dates

[database]
server = "192.168.1.1"
ports = 8001, 8001, 8002
connection_max = 5000
enabled = true

[servers]

[servers.alpha]
ip = "10.0.0.1"
dc = "eqdc10"

[servers.beta]
ip = "10.0.0.2"
dc = "eqdc10"

[clients]
; repeated key as array
data = "gamma", "delta"
data = 1, 2

; Line breaks are OK when inside string
hosts = """
alpha  # this an inline comment will be removed in the real "hosts" string
omega \# escaped, so the text here will be included in "hosts" string
love#xxx this is not comment
"""

Spec

  • TOM is case sensitive.
  • A TOM file must contain only UTF-8 encoded Unicode characters
  • Whitespace means tab (0x09) or space (0x20)
  • Newline means LF (0x0A)

Comment

A hash symbol marks the rest of the line as a comment.

# This is a full-line comment
key = "value" # This is a comment at the end of a line

Key/Value Pair

The primary building block of a TOML document is the key/value pair.

Keys are on the left of the equals sign and values are on the right. Whitespace is ignored around key names and values. The key, equals sign, and value must be on the same line (though some values can be broken over multiple lines).

key = "value"

Keys may be either bare or quoted. Bare keys may only contain letters, numbers, underscores, and dashes (A-Za-z0-9_-). Note that bare keys are allowed to be composed of only digits, e.g. 1234. Quoted keys follow the exact same rules as either basic strings or literal strings and allow you to use a much broader set of key names. Best practice is to use bare keys except when absolutely necessary.

key = "value"
bare_key = "value"
bare-key = "value"
1234 = "value"

"127.0.0.1" = "value"
"character encoding" = "value"
"ʎǝʞ" = "value"
'key2' = "value"
'quoted "value"' = "value"

A bare key must be non-empty, but an empty quoted key is allowed (though discouraged).

= "no key name"  # INVALID
"" = "blank"     # VALID but discouraged
'' = 'blank'     # VALID but discouraged

Values may be of the following types: String, Integer, Float, Boolean, Datetime, Array, or Inline Table.

String

There are four ways to express strings: basic, multi-line basic, literal, and multi-line literal. All strings must contain only valid UTF-8 characters.

Basic strings are surrounded by quotation marks. Any Unicode character may be used except those that must be escaped: quotation mark, backslash, and the control characters (U+0000 to U+001F).

str = "I'm a string. \"You can quote me\". Name\tJos\u00E9\nLocation\tSF."

For convenience, some popular characters have a compact escape sequence.

\b         - backspace       (U+0008)
\t         - tab             (U+0009)
\n         - linefeed        (U+000A)
\f         - form feed       (U+000C)
\r         - carriage return (U+000D)
\"         - quote           (U+0022)
\\         - backslash       (U+005C)
\uXXXX     - unicode         (U+XXXX)
\UXXXXXXXX - unicode         (U+XXXXXXXX)

Any Unicode character may be escaped with the \uXXXX or \UXXXXXXXX forms. The escape codes must be valid Unicode scalar values.

All other escape sequences not listed above are reserved and, if used, TOML should produce an error.

Sometimes you need to express passages of text (e.g. translation files) or would like to break up a very long string into multiple lines. TOML makes this easy. Multi-line basic strings are surrounded by three quotation marks on each side and allow newlines. A newline immediately following the opening delimiter will be trimmed. All other whitespace and newline characters remain intact.

str1 = """
Roses are red
Violets are blue"""

TOML parsers should feel free to normalize newline to whatever makes sense for their platform.

# On a Unix system, the above multi-line string will most likely be the same as:
str2 = "Roses are red\nViolets are blue"

# On a Windows system, it will most likely be equivalent to:
str3 = "Roses are red\r\nViolets are blue"

For writing long strings without introducing extraneous whitespace, end a line with a \. The \ will be trimmed along with all whitespace (including newlines) up to the next non-whitespace character or closing delimiter. If the first characters after the opening delimiter are a backslash and a newline, then they will both be trimmed along with all whitespace and newlines up to the next non-whitespace character or closing delimiter. All of the escape sequences that are valid for basic strings are also valid for multi-line basic strings.

# The following strings are byte-for-byte equivalent:
str1 = "The quick brown fox jumps over the lazy dog."

str2 = """
The quick brown \


  fox jumps over \
    the lazy dog."""

key3 = """\
       The quick brown \
       fox jumps over \
       the lazy dog.\
       """

Any Unicode character may be used except those that must be escaped: backslash and the control characters (U+0000 to U+001F). Quotation marks need not be escaped unless their presence would create a premature closing delimiter.

If you're a frequent specifier of Windows paths or regular expressions, then having to escape backslashes quickly becomes tedious and error prone. To help, TOML supports literal strings where there is no escaping allowed at all. Literal strings are surrounded by single quotes. Like basic strings, they must appear on a single line:

# What you see is what you get.
winpath  = 'C:\Users\nodejs\templates'
winpath2 = '\\ServerX\admin$\system32\'
quoted   = 'Tom "Dubs" Preston-Werner'
regex    = '<\i\c*\s*>'

Since there is no escaping, there is no way to write a single quote inside a literal string enclosed by single quotes. Luckily, TOML supports a multi-line version of literal strings that solves this problem. Multi-line literal strings are surrounded by three single quotes on each side and allow newlines. Like literal strings, there is no escaping whatsoever. A newline immediately following the opening delimiter will be trimmed. All other content between the delimiters is interpreted as-is without modification.

regex2 = '''I [dw]on't need \d{2} apples'''
lines  = '''
The first newline is
trimmed in raw strings.
   All other whitespace
   is preserved.
'''

For binary data it is recommended that you use Base64 or another suitable ASCII or UTF-8 encoding. The handling of that encoding will be application specific.

Integer

Integers are whole numbers. Positive numbers may be prefixed with a plus sign. Negative numbers are prefixed with a minus sign.

int1 = +99
int2 = 42
int3 = 0
int4 = -17

For large numbers, you may use underscores to enhance readability. Each underscore must be surrounded by at least one digit.

int5 = 1_000
int6 = 5_349_221
int7 = 1_2_3_4_5     # valid but inadvisable

Leading zeros are not allowed. Hex, octal, and binary forms are not allowed. Values such as "infinity" and "not a number" that cannot be expressed as a series of digits are not allowed.

64 bit (signed long) range expected (−9,223,372,036,854,775,808 to 9,223,372,036,854,775,807).

Float

A float consists of an integer part (which follows the same rules as integer values) followed by a fractional part and/or an exponent part. If both a fractional part and exponent part are present, the fractional part must precede the exponent part.

# fractional
flt1 = +1.0
flt2 = 3.1415
flt3 = -0.01

# exponent
flt4 = 5e+22
flt5 = 1e6
flt6 = -2E-2

# both
flt7 = 6.626e-34

A fractional part is a decimal point followed by one or more digits.

An exponent part is an E (upper or lower case) followed by an integer part (which follows the same rules as integer values).

Similar to integers, you may use underscores to enhance readability. Each underscore must be surrounded by at least one digit.

flt8 = 9_224_617.445_991_228_313

64-bit (double) precision expected.

Boolean

Booleans are just the tokens you're used to. Always lowercase.

bool1 = true
bool2 = false

Datetime

There are three ways to express a datetime. The first is simply by using the RFC 3339 spec.

date1 = 1979-05-27T07:32:00Z
date2 = 1979-05-27T00:32:00-07:00
date3 = 1979-05-27T00:32:00.999999-07:00

You may omit the local offset and let the parser or host application decide that information. A good default is to use the host machine's local offset.

1979-05-27T07:32:00
1979-05-27T00:32:00.999999

If you only care about the day, you can omit the local offset and the time, letting the parser or host application decide both. Good defaults are to use the host machine's local offset and 00:00:00.

1979-05-27

The precision of fractional seconds is implementation specific, but at least millisecond precision is expected.

Array

Arrays are square brackets with values inside. Whitespace is ignored. Elements are separated by commas. Data types may not be mixed (different ways to define strings should be considered the same type, and so should arrays with different element types).

arr1 = [ 1, 2, 3 ]
arr2 = [ "red", "yellow", "green" ]
arr3 = [ [ 1, 2 ], [3, 4, 5] ]
arr4 = [ "all", 'strings', """are the same""", '''type''']
arr5 = [ [ 1, 2 ], ["a", "b", "c"] ]

arr6 = [ 1, 2.0 ] # INVALID

Arrays can also be multiline. So in addition to ignoring whitespace, arrays also ignore newlines, and comments before those newlines, between the brackets. Terminating commas are ok before the closing bracket.

arr7 = [
  1, 2, 3
]

arr8 = [
  1,
  2, # this is ok
]

Table

Tables (also known as hash tables or dictionaries) are collections of key/value pairs. They appear in square brackets on a line by themselves. You can tell them apart from arrays because arrays are only ever values.

[table]

Under that, and until the next table or EOF are the key/values of that table. Key/value pairs within tables are not guaranteed to be in any specific order.

[table-1]
key1 = "some string"
key2 = 123

[table-2]
key1 = "another string"
key2 = 456

Dots are prohibited in bare keys because dots are used to signify nested tables. Naming rules for each dot separated part are the same as for keys (see definition of Key/Value Pairs).

[dog."tater.man"]
type = "pug"

In JSON land, that would give you the following structure:

{ "dog": { "tater.man": { "type": "pug" } } }

Whitespace around dot-separated parts is ignored, however, best practice is to not use any extraneous whitespace.

[a.b.c]            # this is best practice
[ d.e.f ]          # same as [d.e.f]
[ g .  h  . i ]    # same as [g.h.i]
[ j . "ʞ" . 'l' ]  # same as [j."ʞ".'l']

You don't need to specify all the super-tables if you don't want to. TOML knows how to do it for you.

# [x] you
# [x.y] don't
# [x.y.z] need these
[x.y.z.w] # for this to work

Empty tables are allowed and simply have no key/value pairs within them.

As long as a super-table hasn't been directly defined and hasn't defined a specific key, you may still write to it.

[a.b]
c = 1

[a]
d = 2

You cannot define any key or table more than once. Doing so is invalid.

# DO NOT DO THIS

[a]
b = 1

[a]
c = 2
# DO NOT DO THIS EITHER

[a]
b = 1

[a.b]
c = 2

All table names must be non-empty.

[]     # INVALID
[a.]   # INVALID
[a..b] # INVALID
[.b]   # INVALID
[.]    # INVALID

Inline Table

Inline tables provide a more compact syntax for expressing tables. They are especially useful for grouped data that can otherwise quickly become verbose. Inline tables are enclosed in curly braces { and }. Within the braces, zero or more comma separated key/value pairs may appear. Key/value pairs take the same form as key/value pairs in standard tables. All value types are allowed, including inline tables.

Inline tables are intended to appear on a single line. No newlines are allowed between the curly braces unless they are valid within a value. Even so, it is strongly discouraged to break an inline table onto multiples lines. If you find yourself gripped with this desire, it means you should be using standard tables.

name = { first = "Tom", last = "Preston-Werner" }
point = { x = 1, y = 2 }

The inline tables above are identical to the following standard table definitions:

[name]
first = "Tom"
last = "Preston-Werner"

[point]
x = 1
y = 2

Array of Tables

The last type that has not yet been expressed is an array of tables. These can be expressed by using a table name in double brackets. Each table with the same double bracketed name will be an element in the array. The tables are inserted in the order encountered. A double bracketed table without any key/value pairs will be considered an empty table.

[[products]]
name = "Hammer"
sku = 738594937

[[products]]

[[products]]
name = "Nail"
sku = 284758393
color = "gray"

In JSON land, that would give you the following structure.

{
  "products": [
    { "name": "Hammer", "sku": 738594937 },
    { },
    { "name": "Nail", "sku": 284758393, "color": "gray" }
  ]
}

You can create nested arrays of tables as well. Just use the same double bracket syntax on sub-tables. Each double-bracketed sub-table will belong to the most recently defined table element above it.

[[fruit]]
  name = "apple"

  [fruit.physical]
    color = "red"
    shape = "round"

  [[fruit.variety]]
    name = "red delicious"

  [[fruit.variety]]
    name = "granny smith"

[[fruit]]
  name = "banana"

  [[fruit.variety]]
    name = "plantain"

The above TOML maps to the following JSON.

{
  "fruit": [
    {
      "name": "apple",
      "physical": {
        "color": "red",
        "shape": "round"
      },
      "variety": [
        { "name": "red delicious" },
        { "name": "granny smith" }
      ]
    },
    {
      "name": "banana",
      "variety": [
        { "name": "plantain" }
      ]
    }
  ]
}

Attempting to define a normal table with the same name as an already established array must produce an error at parse time.

# INVALID TOML DOC
[[fruit]]
  name = "apple"

  [[fruit.variety]]
    name = "red delicious"

  # This table conflicts with the previous table
  [fruit.variety]
    name = "granny smith"

You may also use inline tables where appropriate:

points = [ { x = 1, y = 2, z = 3 },
           { x = 7, y = 8, z = 9 },
           { x = 2, y = 4, z = 8 } ]

Filename Extension

TOML files should use the extension .toml.

Comparison with Other Formats

In some ways TOML is very similar to JSON: simple, well-specified, and maps easily to ubiquitous data types. JSON is great for serializing data that will mostly be read and written by computer programs. Where TOML differs from JSON is its emphasis on being easy for humans to read and write. Comments are a good example: they serve no purpose when data is being sent from one program to another, but are very helpful in a configuration file that may be edited by hand.

The YAML format is oriented towards configuration files just like TOML. For many purposes, however, YAML is an overly complex solution. TOML aims for simplicity, a goal which is not apparent in the YAML specification: http://www.yaml.org/spec/1.2/spec.html

The INI format is also frequently used for configuration files. The format is not standardized, however, and usually does not handle more than one or two levels of nesting.

Get Involved

Documentation, bug reports, pull requests, and all other contributions are welcome!

Projects using TOML

  • Cargo - The Rust language's package manager.
  • InfluxDB - Distributed time series database.
  • Heka - Stream processing system by Mozilla.
  • Hugo - Static site generator in Go.
  • bloom.api - Create APIs out of public datasources.
  • MeTA - Modern C++ data science toolkit.
  • CUAUV - Cornell University Autonomous Underwater Vehicle
  • MCPhoton - Multi-threaded Java minecraft server.

Implementations

If you have an implementation, send a pull request adding to this list. Please note the version tag that your parser supports in your Readme.

v0.4.0 compliant

v0.3.1 compliant

v0.2.0 compliant

v0.1.0 compliant

Unknown (or pre-v0.1.0) compliance

Validators

Language agnostic test suite for TOML decoders and encoders

Editor support

Encoder

Converters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment