Skip to content

Instantly share code, notes, and snippets.

@sogaiu
Last active July 13, 2021 03:45
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sogaiu/10ca38298a707fd9a1cce05da7f80b02 to your computer and use it in GitHub Desktop.
Save sogaiu/10ca38298a707fd9a1cce05da7f80b02 to your computer and use it in GitHub Desktop.
tree-sitter term glossary draft

Tree-sitter Glossary

General

  • anonymous node

  • child node

  • field

    To make syntax nodes easier to analyze, many grammars assign unique field names to particular child nodes.

  • grammar

  • grammar rule

    Every grammar rule is written as a JavaScript function that takes a parameter conventionally called $. The syntax $.identifier is how you refer to another grammar symbol within a rule.

  • language

    A TSLanguage is an opaque object that defines how to parse a particular programming language. The code for each TSLanguage is generated by Tree-sitter.

  • named node

  • node

    A TSNode represents a single node in the syntax tree. It tracks its start and end positions in the source code, as well as its relation to other nodes like its parent, siblings and children.

  • parent node

  • parser

    A TSParser is a stateful object that can be assigned a TSLanguage and used to produce a TSTree based on some source code.

  • sibling node

  • syntax tree

    A TSTree represents the syntax tree of an entire source code file. It contains TSNode instances that indicate the structure of the source code. It can also be edited and used to produce a new TSTree in the event that the source code changes.

  • terminal symbol

  • token

Follow the links to see examples :)

  • alternation

    An alternation is written as a pair of square brackets ([]) containing a list of alternative patterns. This is similar to character classes from regular expressions ([abc] matches either a, b, or c).

  • anchor

    The anchor operator, ., is used to constrain the ways in which child patterns are matched. It has different behaviors depending on where it’s placed inside a query.

    When . is placed before the first child within a parent pattern, the child will only match when it is the first named node in the parent.

    Similarly, an anchor placed after a pattern’s last child will cause that child pattern to only match nodes that are the last named child of their parent.

    Finally, an anchor between two child patterns will cause the patterns to only match nodes that are immediate siblings.

  • capture

    When matching patterns, you may want to process specific nodes within the pattern. Captures allow you to associate names with specific nodes in a pattern, so that you can later refer to those nodes by those names. Capture names are written after the nodes that they refer to, and start with an @ character.

  • field name

    In general, it’s a good idea to make patterns more specific by specifying field names associated with child nodes. You do this by prefixing a child pattern with a field name followed by a colon.

  • group

    You can also use parentheses for grouping a sequence of sibling nodes.

    Any of the quantification operators (+, *, and ?) can also be applied to groups.

  • pattern

    A query consists of one or more patterns, where each pattern is an S-expression.

  • predicate

    You can also specify arbitrary metadata and conditions associed with a pattern by adding predicate S-expressions anywhere within your pattern. Predicate S-expressions start with a predicate name beginning with a # character (or . in emacs-tree-sitter). After that, they can contain an arbitrary number of @-prefixed capture names or strings.

    Note - Predicates are not handled directly by the Tree-sitter C library. They are just exposed in a structured form so that higher-level code can perform the filtering. However, higher-level bindings to Tree-sitter like the Rust crate or the WebAssembly binding implement a few common predicates like #eq? and #match?.

  • quantification operator

    You can match a repeating sequence of sibling nodes using the postfix + and * repetition operators, which work analogously to the + and * operators in regular expressions. The + operator matches one or more repetitions of a pattern, and the * operator matches zero or more.

  • query

    A query consists of one or more patterns, where each pattern is an S-expression.

  • S-expression

    A query consists of one or more patterns, where each pattern is an S-expression.

  • wildcard

    A wildcard node is represented with an underscore ((_)), it matches any node. This is similar to . in regular expressions.

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment