Skip to content

Instantly share code, notes, and snippets.

@jaredly
Created November 9, 2024 04:28
Show Gist options
  • Save jaredly/593d66a955b09572f3810b43b75a22a1 to your computer and use it in GitHub Desktop.
Save jaredly/593d66a955b09572f3810b43b75a22a1 to your computer and use it in GitHub Desktop.

Thinking about "syntax families", and how to ~categorize them.

My broad thinking is that all syntax can be usefully flattened into "atoms" and "collections" (delimited & separated sequences of atoms and collectiosn), and then a language's syntax can be characterized by "what kinds of atoms are there" and "what kinds of collections are there". The structured editor that I'm building then works at the level of those atoms and collections (the 'reader' phase, in lisp), providing in my opinion a sweet spot of "just enough structure to be powerful & useful with out being overly restrictive & annoying".

Here's how I would describe various syntaxes:

Programming languages

Forth:

  • one collection, the toplevel. delimiters are SOF and EOF, separator is \s+
  • atoms are \S+

Lisp:

  • atoms are ids [^()[]{}"]+ or "-delimited strings
  • there's the toplevel collection (SOF - EOF), and lists, delimited by () [] or {}

JS:

  • atoms are ids or "|'-delimited strings
  • collections include:
    • [] and () are comma-separated (except for the for-loop (), which is ;-separated)
    • {}-record is a two-column table, with : between cells and , between rows
    • {}-block is ;-separated
    • `-strings consist of a prefix + a two-column table, cells separated by } and rows by ${, first column has expressions and second has string content
    • juxtapositions; this is an un-delimited concatenation of atoms and collections, and encompasses a variety of forms, including unary and binary expressions, postfix(fn, application) and keyword-based control forms (let x = 1, if (x) {y} else {z})

For the next several languages I'm mostly sticking to "differences to javascript":

Elm:

  • [] is comma-sparated, () is whitespace-separated
  • {} is a 2-column table, cells separated by = and rows by ,
  • elm also has significant indentation and newlines at the top level

OCaml:

  • 2-tiered juxtapositions, with ; separating the outer and inner tiers
  • OCaml has significant newlines to separate toplevel forms

Python:

  • blocks are indentation-delimited and separated by ; or \n

Ruby:

  • blocks are keyword-delimited

Markup langauges

XML:

  • atoms are ids or "-delimited strings (within a tag)
  • collections include:
    • -delimiited groups of tags and content
    • attributes are 2-column tables, cells separated by = and rows by whitespace

Markdown:

  • collections:
    • headings, delimited by # and \n
    • styled text delimited by * _ [ ]
    • toplevel collection delimited by SOF and EOF
    • tables have rows separated by \n-+-+-\n, columns by |
  • atoms are words, more or less

CSV

  • one collection, cells separated by , and rows separated by \n
  • atoms are [^,\n]+

Are there any other languages that it would be useful to add? I'd love to hear your thoughts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment