Skip to content

Instantly share code, notes, and snippets.

@GeorgeHernandez
Last active October 26, 2022 19:12
Show Gist options
  • Save GeorgeHernandez/35efe17324a36e6103c6997a4cfc423f to your computer and use it in GitHub Desktop.
Save GeorgeHernandez/35efe17324a36e6103c6997a4cfc423f to your computer and use it in GitHub Desktop.
Notes on YAML Ain't Markup Language (YAML) and JavaScript Object Notation (JSON). YAML is a superset of JSON. YAML, JSON, and XML are data serialization languages ordered from more human-friendly to less.
%YAML 1.2
---
YAML and JSON Notes:
README:
Summary: Notes on YAML Ain't Markup Language (YAML) and JavaScript Object Notation (JSON). YAML is a superset of JSON. YAML, JSON, and XML are data serialization languages ordered from more human-friendly to less.
Authors: George Hernandez
Audience: Public
...
---
YAML Syntax:
Comments:
- YAML comments SHALL start with a pound sign (`#`). # E.g. # A YAML comment.
- JavaScript comments SHALL start with double slahses (`//`). # E.g. // A JavaScript comment.
- In JSON, YAML and JavaScript comments are NOT allowed. However it may be safe for some implementations.
- HACK for safely making comments in YAML or JSON:
- Use a custom key. E.g.:
"COMMENT": "Say what you want"
- May break schema.
Scopes come in 3 types:
1 Document Scope:
- A YAML stream may contain 0+ documents (or sub-documents).
- A document MAY start with a YAML directive.:
- Only 1 per document.
- The YAML directive MUST be followed by a 3 dash (`---`) document marker.
- E.g.: "%YAML 1.2\n---" # Or see the top of this document.
- Sub documents MUST start with either:
- 1. 3 dashes (`---`).
- 2. The YAML directive followed by a 3 dash (`---`) document marker.
- Sub documents MAY end with 3 period (`...`) document marker.
- Document marker ensures that there are no directives in documents.
- E.g.: |
---
time: 20:03:20
player: Sammy Sosa
action: strike (miss)
...
---
time: 20:03:47
player: Sammy Sosa
action: grand slam
...
- There are 3 kinds of documents:
- 1. Bare documents. No directives, no document markers.
- 2. Explicit documents. Document makers, but no directives.
- 3. Directives documents. Directives & document markers.
2 Block Scope:
- Only spaces NOT tabs are allowed for indentation.
- The child node MUST have more indentation than the parent node.
- Nodes of the same scope have equal indentation.
- Scope ends when a line is encountered that has less indentation than the others in the scope.
- E.g.:
- Block list E.g.:
- This is line 2. Line 3 is the next line. Line 1 is the previous line.
- Line 2 & 3 are siblings and are both children of the line 1.
- Block map E.g.:
a: line 2 & 3 are siblings and children of line 1.
b: 3
- Block string E.g.: |
As
you
wish! # Parent's content starts at column after `- `
3 Flow Scope:
- Nodes of the same scope marked with explicit syntax instead of indentation.
- E.g.:
Flow list E.g.: [1, b, c: 3]
Flow map E.g.: {a: 1, b: 2, c: 3}
Flow map formatted like a block. E.g.: {
a: 1,
b: 2
}
Flow string E.g.: As you wish!
Nodes com in 3 types:
1. List Nodes:
- A list node is a collection of nodes, each of which may be a different type.
- Lists are aka sequences, arrays.
- Block list nodes are 1 per line and prefixed with a dash-space (`- `). E.g.:
- 1
- b
- c: 3
- d: 4 # d & e have the same indentation
e: 5 # and thus are siblings within the same node.
-
d: 4 # This way is more explicit,
e: 5 # but uses an extra line.
- f: # f & g have the same indentation
g: 7 # but f is the key to the g, so f is parent to g.
- -1 # Dash-space allows cases like this.
- Flow list nodes are enclosed with square brackets (`[]`) and delimited by a comma (`,`). E.g.: [1, b, c: 3]
2. Map Nodes:
- A map node is a collection of "key" and "value" pairs separated by a colon (`:`) and space (` `).
- Maps are aka associative arrays, hashes, dictionaries.
- Keys MUST not span more than 1024 characters.
- Keys MAY be explictly prefixed by a question mark and space (`? `) but value MUST be on another line. E.g.:
- sea: green
- ? sky
: blue
- ? ["Detroit Tigers", "Chicago Cubs"] # Complex key
: 2001-07-23
- ?
: ack # Empty key
- ? foo
: # Empty value
- ? # Empty key & value
- Block list nodes are 1 per line. E.g.:
name: Julia
gender: Female
- Flow map nodes are enclosed with curly brackets (`{}`) and delimited by a comma (`,`). E.g.: {name: George, gender: Male}
3. String Nodes:
- A string node in YAML & JSON is a scalar, i.e. strings are not a collection of sub-nodes.:
- Of course in other settings, strings are a collection or a sequence of characters.
- Flow Strings:
- Flow strings SHALL be listed 1 of 3 ways:
- Plain, no quotes. E.g.: Don't "dream it", "be it". Escapes like \n DON'T work here.
# Most readable, but more limited.
# Cannot start with indicators.
# Cannot contain : or # or many other YAML characters.
# Must be 1 line if used as a key.
# Not allowed in JSON.
- Single quotes (`'`). E.g.: 'Don''t "dream it", "be it". Escapes like \n DON''T work here.'
# Not allowed in JSON.
- Double quotes (`"`). E.g.: "Don't \"dream it\", \"be it\". Escapes like \n DO work here."
MAY have C-like escapes using backslash (`\`). E.g.: "\x3B \u003B \U0000003B"
- Flow strings CAN span multiple lines.:
- Line breaks are folded. E.g.: Don't "dream it",
"be it".
# Intepreted as:
# Don't "dream it", "be it".
- Double quoted can both fold AND force line breaks. E.g.:
"Line 1 and
line 2 will be folded.
The double return will not.
This line has a forced break\
here."
# Intepreted as:
# Line 1 and line 2 will be folded.\nThe double return will not. This line has a forced break\nhere.
- Block Strings:
- Block strings have a header that precedes the indented block on the next line. E.g.: >
This looks like a block string,
but it is really a flow string!
- Comments can only be placed on the header line; Not in the block; After the block. E.g.: # Comment OK here
a
/# No comments in the string block
c
# Comment OK here BUT must be less indented to avoid ambiguity.
# Subsequent comments can be any indentation.
- Block string headers MUST have at least 1 of a variety of indicators:
- No indicators, therefore NOT a block string. E.g.:
This looks like a block string,
but it is really a flow string!
- Pipe (`|`) indicator for Literal mode. The simplest and most readable, but most restricted mode. E.g.: |
DOES NOT remove leading whitespace.
Escapes like \n DON'T work here.
DOES NOT folded single newlines to a space.
Double newlines are converted to a line break.
There once was a short man from Ealing
Who got on a bus to Darjeeling
It said on the door
"Please don't spit on the floor"
So he carefully spat on the ceiling
The block ends with a new line.
- Greater than bracket (`>`) indicator for Folded mode. E.g.: >
DOES remove leading whitespace.
Escapes like \n DON'T work here.
DOES folded single newlines to a space.
Double newlines are converted to a line break.
Wrapped text
will be folded
into a single
paragraph or line.
Leading spaces
will be folded into the single space
representing the previous new line.
The block ends with a new line.
- Indentation indicator. E.g.: |2
Indentation indicators explicitly set
the indentation depth. This is useful for
cases where the 1st line of a paragraph is
indented more deeply than the rest of the
paragraph.
- Chomping indicators. Control how the final line breaks and trailing empty lines are interpreted. There are 3 chomping variants:
1. No indicator for Clip chomping, the default behavior. E.g.: |
The final line break is preserved.
Trailing empty lines are discarded.
# Equivalent to: "The final line break is preserved.\nTrailing empty lines are discarded.\n"
2. Dash (`-`) indicator for Strip chomping. E.g.: |-
The final line break is discarded.
Trailing empty lines are discarded.
# Equivalent to: "The final line break is discarded.\nTrailing empty lines are discarded."
3. Plus (`+`) indicator for Keep chompping. E.g.: |+
The final line break is preserved.
Trailing empty lines are preserved.
# Equivalent to: "The final line break is preserved.\nTrailing empty lines are preserved.\n\n"
- String Summary Table: In this table `_` indicates a space character (` `).
# > | " ' >- >+ |- |+
# -------------------------|------|-----|-----|-----|------|------|------|------
# Trailing spaces | Kept | Kept | | | | Kept | Kept | Kept | Kept
# Single newline => | _ | \n | _ | _ | _ | _ | _ | \n | \n
# Double newline => | \n | \n\n | \n | \n | \n | \n | \n | \n\n | \n\n
# Final newline => | \n | \n | | | | | \n | | \n
# Final dbl nl's => | | | | | | | Kept | | Kept
# In-line newlines | No | No | No | \n | No | No | No | No | No
# Spaceless newlines| No | No | No | \ | No | No | No | No | No
# Single quote | ' | ' | ' | ' | '' | ' | ' | ' | '
# Double quote | " | " | " | \" | " | " | " | " | "
# Backslash | \ | \ | \ | \\ | \ | \ | \ | \ | \
# " #", ": " | Ok | Ok | No | Ok | Ok | Ok | Ok | Ok | Ok
# Can start on same | No | No | Yes | Yes | Yes | No | No | No | No
# line as key
#
# ~ https://stackoverflow.com/questions/3790454/in-yaml-how-do-i-break-a-string-over-multiple-lines
Node Properties:
- Nodes MAY have 2 optional properties in addition to its content.
- The additional properties may come before or after the content. E.G.:
a : My key has no additional properties. The type depnds on the app.
!!str b : My key has a tag property for explictly identifying its type.
c &myX : My key has an anchor property which can be aliased later.
&myY !!str e: My key has tag & anchor properties.
- VSCode syntax highlighting prefers anchors 1st.
Nodes with Anchors & Aliases:
- 1. A node to be repeated must 1st be anchored, i.e. marked with an ampersand (`&`) concatenated with an ID.
- 2. The node MAY then be aliased, i.e. referenced with an asterisk (`*`) concatenated with the ID.
- The anchor and alias syntax is reminescent of pointer syntax in C.
- If an anchor ID is anchored to another node, then subsequent aliases will refer to the latest anchor.
- NOT allowed in JSON.
- E.g. without anchoring & aliasing:
hr: # 1998 hr ranking
- Mark McGwire
- Sammy Sosa
rbi:
# 1998 rbi ranking
- Sammy Sosa
- Ken Griffey
- E.g. with anchoing & aliasing:
hr: # 1998 hr ranking
- Mark McGwire
- &SS Sammy Sosa
rbi:
# 1998 rbi ranking
- *SS
- Ken Griffey
Nodes with Tags:
- A node MAY have a tag property for explictly identifying its type. E.g.:
a: 123 # implicitly identified as int
b: !!str 123 # explicitly identified as str
- 1. The Handle (namespace abbreviation) of a tag refers to namespace that is either:
- 1.1. The default.
- 1.2. Overridden with a TAG directive and its TAG Prefix.
- 2. The tag property is inserted in a node specifies 2 thing:
- 2.1. The Handle and type.
- 2.2. The fully qualified namespace and type.
- TAG Directive:
- The `%TAG` directive overrides the default namespace for a tag.
- Syntax: "%TAG tag-handle tag-prefix"
- TAG directives MUST be followed by a 3 dash (`---`) document marker.
- Each TAG directive associates a Handle with a Prefix.:
- 1 The Tag Handle is a namespace abbreviation.
- 2 The Tag Prefix is the fully qualified namespace.
- Tag Handles come in 3 varieties:
1. Primary Tag Handles:
The Primary namespace has a Tag Handle of single exclamation mark (`!`). E.g.:
# # ! refers to the default namespace of local
# !int 123
# ...
# # Override the local with a global prefix:
# %TAG ! tag:example.com,2000:app/
# ---
# !int "foo"
#
# # Override the local with a verbatim tag:
# !<tag:example.com,2001:app/int> "bar"
# ...
2. Secondary Tag Handles:
The Secondary namespace has a Tag Handle of double exclamation marks (`!!`). E.g.:
# # !! refers to the default namespace of `tag:yaml.org,2002:`
# !!int 123
# ...
# # Override with a global prefix
# %TAG !! tag:example.com,2000:app/
# ---
# !!int "foo"
#
# # Override the local with a verbatim tag:
# !<tag:example.com,2001:app/int> "bar"
# ...
3. Named Tag Handles:
A Named namespace has a Tag Handle of a name surrounded by exclamation marks (`!`). E.g.:
# %TAG !my! tag:example.com,2000:app/
# ---
# !my!foo "bar"
- Tag Prefixes come in 2 varieties:
1. Local Tag Prefixes:
Local tag prefixes are prefixed by an exclamation mark (`!`), followed by a string that is NOT a valid URI. E.g.:
# %TAG !m! !my-
# ---
# !m!light fluorescent
#
# # Override the local with a verbatim tag:
# !<!my-light> green
2. Global Tag Prefixes:
Global tag prefixes are indicated by a valid URI. E.g.:
# %TAG ! tag:example.com,2000:app/
YAML Schemas:
- A YAML schema is a combination of a set of tags and a mechanism for resolving non-specific tags.
- Some recommended YAML Schemas:
- 1. Failsafe Schema:
- Guaranteed to work with any YAML document.
- The namespace for Secondary Tags. E.g.: "!!map"
- Tags:
- tag:yaml.org,2002:map
- tag:yaml.org,2002:seq: For lists
- tag:yaml.org,2002:str
- 2. JSON Schema:
- The JSON schema is the lowest common denominator of most modern computer languages, & allows parsing JSON files.
- Tags:
- tag:yaml.org,2002:null
- tag:yaml.org,2002:bool: [true, false]
- tag:yaml.org,2002:int:
- RegEx: "0 | -? [1-9] [0-9]*"
- Roughly: ..., -1, 0, 1, ...
- tag:yaml.org,2002:float:
- [0, .inf, -inf, .nan]
- RegEx: '-? [1-9] ( \. [0-9]* [1-9] )? ( e [-+] [1-9] [0-9]* )?'
- Roughly: -inf, like -1.23e-4, like -123.4, 0, like 123.4, like 1.23e+4, .inf, .nan
- 3. Core Schema:
- The Core schema is an extension of the JSON schema, allowing for more human-readable presentation of the same types.
- Allows RegEx to resolve to Failsafe or JSON Schema tags: '
null | Null | NULL | ~ tag:yaml.org,2002:null
/* Empty */ tag:yaml.org,2002:null
true | True | TRUE | false | False | FALSE tag:yaml.org,2002:bool
[-+]? [0-9]+ tag:yaml.org,2002:int (Base 10)
0o [0-7]+ tag:yaml.org,2002:int (Base 8)
0x [0-9a-fA-F]+ tag:yaml.org,2002:int (Base 16)
[-+]? ( \. [0-9]+ | [0-9]+ ( \. [0-9]* )? ) ( [eE] [-+]? [0-9]+ )? tag:yaml.org,2002:float (Number)
[-+]? ( \.inf | \.Inf | \.INF ) tag:yaml.org,2002:float (Infinity)
\.nan | \.NaN | \.NAN tag:yaml.org,2002:float (Not a number)
* tag:yaml.org,2002:str (Default)
'
- Language-Independent Types for YAML Version 1.1:
- http://yaml.org/type/
- This includes several useful types NOT in the recommended YAML schemas.
- Implementaitons seem to implement only some of the types.
- Collection Types:
- "!!map html pdf ps" : "Unordered set of key : value pairs without duplicates."
- "!!omap html pdf ps" : "Ordered sequence of key: value pairs without duplicates."
- "!!pairs html pdf ps" : "Ordered sequence of key: value pairs allowing duplicates."
- "!!set html pdf ps" : "Unordered set of non-equal values."
- "!!seq html pdf ps" : "Sequence of arbitrary values."
- Scalar Types:
- "!!binary html pdf ps" :
- "A sequence of zero or more octets (8 bit values)."
- E.g.:
# e: !!binary |
# R0lGODlhDAAMAIQAAP//9/X
# 17unp5WZmZgAAAOfn515eXv
# Pz7Y6OjuDg4J+fn5OTk6enp
# # as base64
- "!!bool html pdf ps" :
- "Mathematical Booleans."
- RegEx: y|Y|yes|Yes|YES|n|N|no|No|NO|true|True|TRUE|false|False|FALSE|on|On|ON|off|Off|OFF
- "!!float html pdf ps" :
- "Floating-point approximation to real numbers."
- RegEx: '
[-+]?([0-9][0-9_]*)?\.[0-9.]*([eE][-+][0-9]+)? (base 10)
|[-+]?[0-9][0-9_]*(:[0-5]?[0-9])+\.[0-9_]* (base 60)
|[-+]?\.(inf|Inf|INF) # (infinity)
|\.(nan|NaN|NAN) # (not a number)'
- "!!int html pdf ps" :
- "Mathematical integers."
- Not just decimal. E.g.: '
canonical : 685230
decimal : +685_230
octal : 02472256
hexadecimal: 0x_0A_74_AE
binary : 0b1010_0111_0100_1010_1110
sexagesimal: 190: 20: 30'
- "!!merge html pdf ps" : "Specify one or more mappings to be merged with the current one."
- "!!null html pdf ps" :
- "Devoid of value."
- RegEx: '
~ # (canonical)
|null|Null|NULL # (English)
| # (Empty)'
- "!!str html pdf ps" : "A sequence of zero or more Unicode characters."
- "!!timestamp html pdf ps":
- "A point in time."
- E.g.:
myIso8601 : 2001-12-14t21:59:43.10-05:00
mySpaced : 2001-12-14 21:59:43.10 -5
myDate : 2002-12-14
- "!!value html pdf ps" : "Specify the default value of a mapping."
- "!!yaml html pdf ps" : "Keys for encoding YAML in YAML."
Misc:
Indicator Characters:
- Indicators are characters that have special semantics in YAML.
- There are 21 in YAML1.2: " - ? : , [ ] { } # & * ! | > '\" % @ `"
- The at (`@`) and grave accent ("\`") are reserved indicators in YAML1.2 but have no particular purpose yet. Plain strings (unquoted) cannot start with these reserved characters. # I tend to use ` in Markdown fashion to indicate code.
Escaped Characters:
- Indicator or mon-printable characters must be escaped.
- YAML escape sequences are a superset of C's escape sequences. :
- YAML: " \0 \a \b \t \n \v \f \r \e \ \" \/ \\ \N \L \P \x4A \u004A \U0000004A "
- JSON: " \b \t \n \f \r \" \/ \\ \u004A "
Empty Node:
Empty nodes are commonly resolved to empty strings or null. E.g.:
- {
'' : b, # key is empty string
a : '' # value is empty string
}
- {
: c, # key is null
? d
: # value is null
}
---
YAML E.g.:
receipt: Oz-Ware Purchase Invoice
date: 2012-08-06
customer:
given: Dorothy
family: Gale
items:
- part_no: A4786
descrip: Water Bucket (Filled)
price: 1.47
quantity: 4
- part_no: E1628
descrip: High Heeled "Ruby" Slippers
size: 8
price: 100.27
quantity: 1
total: 106.15
tax: 1.00
grandtotal: 107.15
bill-to: &id001
street: |
123 Tornado Alley
Suite 16
city: East Centerville
state: KS
ship-to: *id001
specialDelivery: >
Follow the Yellow Brick
Road to the Emerald City.
Pay no attention to the
man behind the curtain.
...
---
{
"Notes on JSON": {
"Intro": [
"JSON is a subset of YAML. YAML 1.2 is a superset of JSON.",
"In honor, of that this section is done as JSON!",
"JSON's simplicity makes it more universal but less human-friendly than YAML."
],
"JSON's Components": {
"Comments" : [
"JavaScript comments SHALL start with double slahses (`//`). # E.g.: // A JavaScript comment.",
"However, neither YAML or JavaScript comments are allowed in JSON!.",
"Never the less, people often add JavaScript comments anyway.",
"Microsoft Visual Code allows JavaScript-style comments in their JSONC config files",
{
"HACK for making comments": {
"Use a custom key. E.g.": {
"COMMENT": "Say what you want to say."
},
"Qualification": "Beware of breaking your schema"
}
}
],
"Scope": "Only flow scope allowed, but it is often formatted like a block.",
"Nodes": {
"Null": null,
"Boolean": [true, false],
"Numbers": {
"int": {
"RegEx": "0 | -? [1-9] [0-9]*",
"Roughly": "..., -1, 0, 1, ...",
"E.g.": [-999, -1, 0, 1, 999]
},
"float": {
"E.g.": [-.inf, -123.4, -1.23e-4, 0, 123.4, 1.23e+4, .inf, .nan]
},
"Strings": [
"Only flow double-quoted strings allowed.",
"Allowed Escapes": " \" \\ \/ \b \f \n \r \t \u004A ",
{"E.g.": "This is a double-quoted string."}
],
"Arrays": [
"Only flow lists allowed.",
"Arrays can hold any kind of node.",
{"E.g.": [1, "b", {"c": 3}]}
],
"Objects": [
"Only flow maps allowed.",
"Keys must be strings.",
{
"E.g.": {
"a": 1,
"b": "2",
"c": {"d": null}
}
}
]
}
}
}
}
}
...
---
Links:
JSON:
- http://www.json.org:
- http://www.json.org/example.html
- https://en.wikipedia.org/wiki/JSON
- http://json-schema.org/. JSON Schema is a vocabulary that allows you to annotate and validate JSON documents.
- http://jsonlint.com/
- http://jsonviewer.stack.hu/
- http://jsonformatter.curiousconcept.com/
- http://ndjson.org/. Basically multiple JSON object that are newline delimited.
- Convert from CSV to JSON online:
- http://keyangxiang.com/blog/csv2json/
- http://shancarter.github.io/mr-data-converter/
- http://www.convertcsv.com/csv-to-json.htm
- http://www.cparker15.com/code/utilities/csv-to-json/
YAML:
- http://www.yaml.org:
- http://www.yaml.org/start.html
- http://www.yaml.org/refcard.html
- http://www.yaml.org/spec/1.2/spec.html :
- Each syntax unit has both a number and name in BNF. E.g.: "[12] c-comment ::= “#”"
- The start of each name identifies its category:
e- : No charcters.
c- : Starting and ending with a special character.
b- : A single line break.
nb- : Starting and ending with a non-break character.
s- : Starting and ending with a white space character.
ns- : Starting and ending with a non-space character.
l- : Matching complete line(s).
X-Y- : Starting with an X- character and ending with a Y- character, where X- and Y- are any of the above prefixes.
X+, X-Y+: As above, with the additional property that the matched content indentation level is greater than the specified n parameter.
- http://www.yaml.org/faq.html. "Why does YAML forbid tabs? Tabs have been outlawed since they are treated differently by different editors and tools. And since indentation is so critical to proper interpretation of YAML, this issue is just too tricky to even attempt. Indeed Guido van Rossum of Python has acknowledged that allowing TABs in Python source is a headache for many people and that were he to design Python again, he would forbid them."
- http://yaml.org/type/. For YAML1.1.
- https://en.wikipedia.org/wiki/YAML
- https://stackoverflow.com/questions/3790454/in-yaml-how-do-i-break-a-string-over-multiple-lines
- http://yaml-multiline.info/
...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment