Skip to content

Instantly share code, notes, and snippets.

@GeorgeHernandez
Created December 4, 2018 16:06
Show Gist options
  • Save GeorgeHernandez/77912675d3edb04abf552fe6a3b99303 to your computer and use it in GitHub Desktop.
Save GeorgeHernandez/77912675d3edb04abf552fe6a3b99303 to your computer and use it in GitHub Desktop.
Notes on Tom's Obvious, Minimal Language (TOML), a format for configuration files.
# TOML Notes
# Summary: Notes on Tom's Obvious, Minimal Language (TOML), a format for configuration files. These notes are a regurgitation of the spec for the sake of learning TOML.
# Authors: George Hernandez
# Audience: Public
# INTRO
# TOML's syntax largely consists of: key = "value" pairs, [section names], and # comments.
# It specifies a list of supported data types: String, Integer, Float, Boolean, Datetime, Array, and Table.
# TOML is case sensitive.
# A TOML file must be a valid UTF-8 encoded Unicode document.
# Whitespace means tab (0x09) or space (0x20).
# Newline means LF (0x0A) or CRLF (0x0D0A).
# TOML files should use the extension .toml.
# When transferring TOML files over the internet, the appropriate MIME type is application/toml.
# Comparison with Other Formats
# In some ways TOML is very similar to JSON: simple, well-specified, and maps easily to ubiquitous data types. JSON is great for serializing data that will mostly be read and written by computer programs. Where TOML differs from JSON is its emphasis on being easy for humans to read and write. Comments are a good example: they serve no purpose when data is being sent from one program to another, but are very helpful in a configuration file that may be edited by hand.
# The YAML format is oriented towards configuration files just like TOML. For many purposes, however, YAML is an overly complex solution. TOML aims for simplicity, a goal which is not apparent in the YAML specification: http://www.yaml.org/spec/1.2/spec.html
# The INI format is also frequently used for configuration files. The format is not standardized, however, and usually does not handle more than one or two levels of nesting.
# REFERENCES
# https://en.wikipedia.org/wiki/TOML
# https://github.com/toml-lang/toml
# EXAMPLE
title = "TOML Example"
[owner]
name = "Tom Preston-Werner"
dob = 1979-05-27T07:32:00-08:00 # First class dates
[database]
server = "192.168.1.1"
ports = [ 8001, 8001, 8002 ]
connection_max = 5000
enabled = true
[servers]
# Indentation (tabs and/or spaces) is allowed but not required
[servers.alpha]
ip = "10.0.0.1"
dc = "eqdc10"
[servers.beta]
ip = "10.0.0.2"
dc = "eqdc10"
[clients]
data = [ ["gamma", "delta"], [1, 2] ]
# Line breaks are OK when inside arrays
hosts = [
"alpha",
"omega"
]
# KEYS
# Keys are followed by an equal sign (=) and a value.
# There are 3 types of keys.
# BARE KEYS are limited to (A-Za-z0-9_-). EG
Bare-key = "foo"
123 = 'foo'
Bare_key = # No value, therefore invalid
= 'Empty bare key, therefore invalid'
# QUOTED KEYS follow the rules of TOML strings. EG:
"Quoted key" = 'foo'
'' = "Empty quoted keys are valid but discouraged"
# DOTTED KEYS are bare or quoted keys joined with a dot. EG:
dog.name = "Fido"
dog."age" = 7
# Redefining a key is invalid. EG:
bad = 1
bad = 2
# STRINGS
# There are 4 types of strings.
# BASIC STRINGS are surrounded by quotation marks (").
# Must appear on a single line.
# Any Unicode character may be used but the following must be escaped:
# quotes ("), backslash (\), control characters (U+0000 to U+001F, U+007F). EG:
str = "I'm a string. \"You can quote me\". Name\tJos\u00E9\nLocation\tSF."
# The following escapes are valid:
# \b - backspace (U+0008)
# \t - tab (U+0009)
# \n - linefeed (U+000A)
# \f - form feed (U+000C)
# \r - carriage return (U+000D)
# \" - quote (U+0022)
# \\ - backslash (U+005C)
# \uXXXX - unicode (U+XXXX)
# \UXXXXXXXX - unicode (U+XXXXXXXX)
# MULTI-LINE BASIC STRINGS are surrounded by 3 quotation marks (""").
# A newline immediately following the opening delimiter will be trimmed. All other whitespace and newline characters remain intact. EG:
str1 = """
Roses are red
Violets are blue"""
# Quotation marks need not be escaped unless their presence would create a premature closing delimiter.
# LITERAL STRING are surrounded by single quotes (').
# Must appear on a single line.
# No escaping is allowed.
# MULTI-LINE LITERAL STRINGS are surrounded by 3 single quotes (''').
# A newline immediately following the opening delimiter will be trimmed. All other content between the delimiters is interpreted as-is without modification.
# INTEGERS
# Integers are whole numbers. Positive numbers may be prefixed with a plus sign. Negative numbers are prefixed with a minus sign. EG:
int1 = +99
int2 = 42
int3 = 0
int4 = -17
# For large numbers, you may use underscores between digits to enhance readability. Each underscore must be surrounded by at least one digit on each side. EG:
int5 = 1_000
int6 = 5_349_221
int7 = 1_2_3_4_5 # VALID but discouraged
# Leading zeros are not allowed. Integer values -0 and +0 are valid and identical to an unprefixed zero.
# Non-negative integer values may also be expressed in hexadecimal, octal, or binary. In these formats, leading zeros are allowed (after the prefix). Hex values are case insensitive. Underscores are allowed between digits (but not between the prefix and the value). EG:
# hexadecimal with prefix `0x`
hex1 = 0xDEADBEEF
hex2 = 0xdeadbeef
hex3 = 0xdead_beef
# octal with prefix `0o`
oct1 = 0o01234567
oct2 = 0o755 # useful for Unix file permissions
# binary with prefix `0b`
bin1 = 0b11010110
# 64 bit (signed long) range expected (−9,223,372,036,854,775,808 to 9,223,372,036,854,775,807).
# FLOATS
# Floats should be implemented as IEEE 754 binary64 values.
# A float consists of an integer part (which follows the same rules as integer values) followed by a fractional part and/or an exponent part. If both a fractional part and exponent part are present, the fractional part must precede the exponent part. EG:
# fractional
flt1 = +1.0
flt2 = 3.1415
flt3 = -0.01
# exponent
flt4 = 5e+22
flt5 = 1e6
flt6 = -2E-2
# both
flt7 = 6.626e-34
# Similar to integers, you may use underscores to enhance readability. Each underscore must be surrounded by at least one digit. EG:
flt8 = 9_224_617.445_991_228_313
# Float values -0.0 and +0.0 are valid and should map according to IEEE 754.
# Special float values can also be expressed. They are always lowercase.
# infinity
sf1 = inf # positive infinity
sf2 = +inf # positive infinity
sf3 = -inf # negative infinity
# not a number
sf4 = nan # actual sNaN/qNaN encoding is implementation specific
sf5 = +nan # same as `nan`
sf6 = -nan # valid, actual encoding is implementation specific
# BOOLEANS
# Booleans are just the tokens you're used to. Always lowercase. EG:
bool1 = true
bool2 = false
# DATES AND TIMES
# There are 4 types of dates and times.
# OFFSET DATE-TIME
# To unambiguously represent a specific instant in time, you may use an RFC 3339 formatted date-time with offset. EG:
odt1 = 1979-05-27T07:32:00Z
odt2 = 1979-05-27T00:32:00-07:00
odt3 = 1979-05-27T00:32:00.999999-07:00
# For the sake of readability, you may replace the T delimiter between date and time with a space (as permitted by RFC 3339 section 5.6). EG:
odt4 = 1979-05-27 07:32:00Z
# The precision of fractional seconds is implementation specific, but at least millisecond precision is expected. If the value contains greater precision than the implementation can support, the additional precision must be truncated, not rounded.
# LOCAL DATE-TIME
# If you omit the offset from an RFC 3339 formatted date-time, it will represent the given date-time without any relation to an offset or timezone. It cannot be converted to an instant in time without additional information. Conversion to an instant, if required, is implementation specific. EG:
ldt1 = 1979-05-27T07:32:00
ldt2 = 1979-05-27T00:32:00.999999
# The precision of fractional seconds is implementation specific, but at least millisecond precision is expected. If the value contains greater precision than the implementation can support, the additional precision must be truncated, not rounded.
# I ASSUME that the T may be replaced. EG:
ldt3 = 1979-05-27 00:32:00.999999
# LOCAL DATE
# If you include only the date portion of an RFC 3339 formatted date-time, it will represent that entire day without any relation to an offset or timezone. EG:
ld1 = 1979-05-27
# LOCAL TIME
# If you include only the time portion of an RFC 3339 formatted date-time, it will represent that time of day without any relation to a specific day or any offset or timezone. EG:
lt1 = 07:32:00
lt2 = 00:32:00.999999
# The precision of fractional seconds is implementation specific, but at least millisecond precision is expected. If the value contains greater precision than the implementation can support, the additional precision must be truncated, not rounded.
# ARRAYS
# Arrays are square brackets with values inside. Whitespace is ignored. Elements are separated by commas. Data types may NOT be mixed (different ways to define strings should be considered the same type, and so should arrays with different element types). EG:
arr1 = [ 1, 2, 3 ]
arr2 = [ "red", "yellow", "green" ]
arr3 = [ [ 1, 2 ], [3, 4, 5] ]
arr4 = [ "all", 'strings', """are the same""", '''type''']
arr5 = [ [ 1, 2 ], ["a", "b", "c"] ]
arr6 = [ 1, 2.0 ] # INVALID
# Arrays can also be multiline. Terminating commas (also called trailing commas) are ok after the last value of the array. There can be an arbitrary number of newlines and comments before a value and before the closing bracket. EG:
arr7 = [
1, 2, 3
]
arr8 = [
1,
2, # this is ok
]
# TABLES
# Tables (also known as hash tables or dictionaries) are collections of key/value pairs. They appear in square brackets on a line by themselves. You can tell them apart from arrays because arrays are only ever values. EG:
[table]
# Under that, and until the next table or EOF are the key/values of that table. Key/value pairs within tables are not guaranteed to be in any specific order.
[table-1]
key1 = "some string"
key2 = 123
[table-2]
key1 = "another string"
key2 = 456
# Naming rules for tables are the same as for keys (see definition of Keys above). EG:
[dog."tater.man"]
type.name = "pug"
# In JSON land, that would give you the following structure. EG:
{ "dog": { "tater.man": { "type": { "name": "pug" } } } }
# Whitespace around the key is ignored, however, best practice is to not use any extraneous whitespace. EG:
[a.b.c] # this is best practice
[ d.e.f ] # same as [d.e.f]
[ g . h . i ] # same as [g.h.i]
[ j . "ʞ" . 'l' ] # same as [j."ʞ".'l']
# You don't need to specify all the super-tables if you don't want to. TOML knows how to do it for you. EG:
# [x] you
# [x.y] don't
# [x.y.z] need these
[x.y.z.w] # for this to work
# Empty tables are allowed and simply have no key/value pairs within them.
# Like keys, you cannot define any table more than once. Doing so is invalid. EG:
# DO NOT DO THIS:
[a]
b = 1
[a]
c = 2
# DO NOT DO THIS EITHER:
[a]
b = 1
[a.b]
c = 2
# INLINE TABLES
# Inline tables provide a more compact syntax for expressing tables. They are especially useful for grouped data that can otherwise quickly become verbose. Inline tables are enclosed in curly braces { and }. Within the braces, zero or more comma separated key/value pairs may appear. Key/value pairs take the same form as key/value pairs in standard tables. All value types are allowed, including inline tables.
# Inline tables are intended to appear on a single line. No newlines are allowed between the curly braces unless they are valid within a value. Even so, it is strongly discouraged to break an inline table onto multiples lines. If you find yourself gripped with this desire, it means you should be using standard tables. EG:
name = { first = "Tom", last = "Preston-Werner" }
point = { x = 1, y = 2 }
animal = { type.name = "pug" }
# The inline tables above are identical to the following standard table definitions:
[name]
first = "Tom"
last = "Preston-Werner"
[point]
x = 1
y = 2
[animal]
type.name = "pug"
# ARRAY OF TABLES
# The last type that has not yet been expressed is an array of tables. These can be expressed by using a table name in double brackets. Each table with the same double bracketed name will be an element in the array. The tables are inserted in the order encountered. A double bracketed table without any key/value pairs will be considered an empty table. EG:
[[products]]
name = "Hammer"
sku = 738594937
[[products]]
[[products]]
name = "Nail"
sku = 284758393
color = "gray"
# In JSON land, that would give you the following structure.
{
"products": [
{ "name": "Hammer", "sku": 738594937 },
{ },
{ "name": "Nail", "sku": 284758393, "color": "gray" }
]
}
# You can create nested arrays of tables as well. Just use the same double bracket syntax on sub-tables. Each double-bracketed sub-table will belong to the most recently defined table element above it. EG:
[[fruit]]
name = "apple"
[fruit.physical]
color = "red"
shape = "round"
[[fruit.variety]]
name = "red delicious"
[[fruit.variety]]
name = "granny smith"
[[fruit]]
name = "banana"
[[fruit.variety]]
name = "plantain"
# The above TOML maps to the following JSON.
{
"fruit": [
{
"name": "apple",
"physical": {
"color": "red",
"shape": "round"
},
"variety": [
{ "name": "red delicious" },
{ "name": "granny smith" }
]
},
{
"name": "banana",
"variety": [
{ "name": "plantain" }
]
}
]
}
# Attempting to append to a statically defined array, even if that array is empty or of compatible type, must produce an error at parse time. EG:
# INVALID TOML DOC
fruit = []
[[fruit]] # Not allowed
# Attempting to define a normal table with the same name as an already established array must produce an error at parse time.
# INVALID TOML DOC
[[fruit]]
name = "apple"
[[fruit.variety]]
name = "red delicious"
# This table conflicts with the previous table
[fruit.variety]
name = "granny smith"
# You may also use inline tables where appropriate:
points = [ { x = 1, y = 2, z = 3 },
{ x = 7, y = 8, z = 9 },
{ x = 2, y = 4, z = 8 } ]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment