My broad thinking is that all syntax can be usefully flattened into "atoms" and "collections" (delimited & separated sequences of atoms and collectiosn), and then a language's syntax can be characterized by "what kinds of atoms are there" and "what kinds of collections are there". The structured editor that I'm building then works at the level of those atoms and collections (the 'reader' phase, in lisp), providing in my opinion a sweet spot of "just enough structure to be powerful & useful with out being overly restrictive & annoying".
Here's how I would describe various syntaxes:
Forth:
- one collection, the toplevel. delimiters are SOF and EOF, separator is
\s+
- atoms are
\S+
Lisp:
- atoms are ids
[^()[]{}"]+
or "-delimited strings - there's the toplevel collection (SOF - EOF), and lists, delimited by () [] or {}
JS:
- atoms are ids or "|'-delimited strings
- collections include:
- [] and () are comma-separated (except for the for-loop (), which is ;-separated)
- {}-record is a two-column table, with : between cells and , between rows
- {}-block is ;-separated
- `-strings consist of a prefix + a two-column table, cells separated by } and rows by ${, first column has expressions and second has string content
- juxtapositions; this is an un-delimited concatenation of atoms and collections, and encompasses a variety of forms, including unary and binary expressions, postfix(fn, application) and keyword-based control forms (let x = 1, if (x) {y} else {z})
For the next several languages I'm mostly sticking to "differences to javascript":
Elm:
- [] is comma-sparated, () is whitespace-separated
- {} is a 2-column table, cells separated by = and rows by ,
- elm also has significant indentation and newlines at the top level
OCaml:
- 2-tiered juxtapositions, with ; separating the outer and inner tiers
- OCaml has significant newlines to separate toplevel forms
Python:
- blocks are indentation-delimited and separated by ; or \n
Ruby:
- blocks are keyword-delimited
XML:
- atoms are ids or "-delimited strings (within a tag)
- collections include:
- -delimiited groups of tags and content
- attributes are 2-column tables, cells separated by = and rows by whitespace
Markdown:
- collections:
- headings, delimited by # and \n
- styled text delimited by * _ [ ]
- toplevel collection delimited by SOF and EOF
- tables have rows separated by \n-+-+-\n, columns by |
- atoms are words, more or less
CSV
- one collection, cells separated by , and rows separated by \n
- atoms are [^,\n]+
Are there any other languages that it would be useful to add? I'd love to hear your thoughts!