Skip to content

Instantly share code, notes, and snippets.

@PhakornKiong
Created April 7, 2021 02:48
Show Gist options
  • Save PhakornKiong/f184dd74feac0f5f4d7dd19671fafeea to your computer and use it in GitHub Desktop.
Save PhakornKiong/f184dd74feac0f5f4d7dd19671fafeea to your computer and use it in GitHub Desktop.

Understanding how Babel Works

One of the more interesting I’ve done recently is learning how to modify JavaScript code programmatically using Babel. Babel is a JavaScript transpiler, best known for transpiling ES5+ code into a backwards-compatible version of JavaScript that can run in any browser.

What is a transpiler?

Transpilers are also known as source-to-source compilers. They are tools that read source code written in one programming language and produce equivalent code in another language. Transpilers enables languages like TypeScript and CoffeeScript to exists, which is syntactic sugar of JavaScript. There are basically three main steps for a transpiler to be able to do what seems like a magical feature.

  1. Lexical Analysis (Tokenization) - Source Code to Token
  2. Syntax Analysis (Parsing) - Token to Abstract Syntax Tree
  3. Code Generation - Abstract Syntax Tree to Source Code

During this step, the lexical analyzer (let's call it a tokenizer) converts code in the form of strings into an array of tokens using defined rules. Tokenizer will scan the code, character by character, and when it encounters a symbol or whitespace, it decides whether a word is completed and finally give it a type and value. However, resulting tokens does not explain how things fit together. It represents merely the components of the input.

Babel does not export its tokenizer as part of the package (Its functionality is grouped with @babel/parser, however ESPRIMA provide a simple to use tokenizer that can be used to visualize the result of tokenization. The resulting information is helpful for syntax analysis.

https://gist.github.com/71701ac547b7fffe3f6ac314727c3ba3

In this step, the tokens produced will be converted into an Abstract Syntax Tree (AST).

AST is a tree representation of the abstract syntactic structure of source code written in a programming language. Each node of the tree denotes a construct occurring in the source code.

AST is useful in representing abstract structure (does not represent every detail appearing in real syntax) of source code. AST can be used to compile code to different languages, transpile code (what babel does), perform static analysis of code, generate source code and more.

https://gist.github.com/86569f0caed12b2f4afe9375523dd22c

The code above will generate the following AST. There are many parser available to generate AST, which will have some minor differences in taste. You may try the AST viewer here.

ast

During this step, the modified AST will be used to generate the final code to be used. In the context of Babel, this is provided via @babel/generator

Babel

Putting the three steps together, we’re able to manipulate code with confidence by modifying the AST using Babel.

In the following example, we’re looking to change the variable keyword n into x, flow of the transformation is as follow:

  1. @babel/parser provides a parse method which that convert source code into an AST.
  2. @babel/traverse provide us with a way to get to the node that we’re interested in using a pattern known as visitor. We’re defining an object literal which implements a visitor property that consists of an object of methods named to match the node it should process (typing is based on @babel/types). In this example, we’re looking for Identifier node, and the function will visit both Identifier once ( n and y). However, in real-world code where multiple visitor and plugins are used, there are chances that a node might get visited more than once (although rare, it happens). So there should be some checks done to skip the transformation.
  3. @babel/generator turns the modified AST back into code.

https://gist.github.com/4ef9edac700c3be28e332ad0a036354e

All in all, we can see that Babel had taken care of all the three steps for us so we can focus on transforming AST according to our requirements. For the common use-cases of Babel, we can easily mix and match the published plugins to get what we want.

Traversing the AST with Babel

To traverse AST, we simply define an object literal with methods defined for accepting particular node types in a tree (typing is based on @babel/types). For example:

https://gist.github.com/3271b6b20c83165533758e5150be2fbd

The code above is a simple visitor that during traversal, will call the Identifier() method for every Identifier visited. We can also use aliases as visitor nodes (for example Function is an alias for FunctionDeclaration, FunctionExpression, ArrowFunctionExpression, ObjectMethod and ClassMethod)

While traversing an AST to transform the source code, you should be aware of the traversal sequence. You may run the following code and compare the output with AST Explorer to visualize the flow.

https://gist.github.com/9f936ffb9f9f982c90b1c4028ea055b0

Paths

The path we get during traversal is an object representation of the link between nodes. Imagine the following code:

https://gist.github.com/1d7815c3c9de54547bc9238f4e8f5d02

We know that the code above have FunctionDeclaration and Identifier node. If we choose to represent Identifier as a path, we will get something like the snippet below.

https://gist.github.com/f7c6108a54405662b86e8b616fc0cff9

Babel does provide additional metadata and methods to make AST transformation easier. In summary, paths are a reactive representation of a node position in the tree, with additional information appended. Whenever you call a method that modifies the tree, Babel manages everything and update the information. Babel makes working with AST very straightforward.

State

We need to be extremely careful about state during AST transformation. Let's take the following code:

https://gist.github.com/981083946717e21b5779d87bd9374cf2

Assuming we want to rename foo into baz inside the add function, we could do a hacky transformation like this:

https://gist.github.com/0f8a01aac32ccc9486986da51389f3ec

The above transformation should work. We’re changing the param inside FunctionDeclaration and the name inside Identifier. The transformed source code we get is actually like this:

https://gist.github.com/a563ca54c10747cc7a6bed692285a65d

But why? This is because we’re modifying every Identifier with the same name as foo. This is due to the state of AST, and it is likely to happen if you’re working with a big codebase.

How could we avoid the above? We could do a recursive traverse inside the FunctionDeclaration to eliminate polluting the global state

https://gist.github.com/3480dd1cdc8c41144611b2316c03f941

The solution above is a very naïve example, actual transformation logic in real-world cases should have more checks to ensure transformation only occurs on the node that you’re interested in.

Scope

Note that JavaScript implements lexical scoping. We can imagine this like a tree structure where every nested node has its scope. Code within a deeper scope can use a reference from a higher scope and create a reference of the same name without modifying it.

https://gist.github.com/33463d183b39238f0fe048fe899aebf0

When transforming a complex AST, we should be wary of the scope to avoid breaking existing code. Let’s look at the next example and identify the issue with the transformation.

https://gist.github.com/2e052f5ef9a4fb575219fa22c52f570f

Assume we would like to transform the first param of add function from foo to x. We could easily reuse the logic from the previous example to do the transformation. However, we would end up with the following:

https://gist.github.com/666c5fc48637c60de4642d3d7116f9ea

At first, it seems that the transformation is correct. However, the add function refers to x and y from the global scope. Based on the current transformation, the variable x no longer refers to the one in the global scope. In a large codebase, we wouldn’t even notice this error as it is almost impossible to keep track of closure and scoping.

Nonetheless, let’s try to solve the example above.

https://gist.github.com/aefaf72383956a946248b712eb432df4

Although the code above is not perfect, it shows how complex and difficult it is to deal with scope during transformation. We have to keep track of all the binding and reference while figuring out if there is a conflict in the identifier name. We’re in luck as Babel provide many useful methods that we could use inside path.scope (You may refer to here).

With the power of Babel, we could simplify it down to the following:

https://gist.github.com/74ed156e276a52a59ee86afb68a173ae

The next natural question would be, what are the use cases of Babel. Following is some example that you might be interested in:

  • Create custom syntax using Babel syntax plugins - Great tutorial by Tan Li Hau
  • Transform source code across entire application (lets say transforming all old-fashioned antonymous function with arrow functions)
  • Create custom linter using Babel transform plugins

At last, have fun with Babel!

Useful Reference

Babel Plugin Handbook by Jamie Kyle

Babel in Depth - Architecture & Use Cases (Mandarin) by 荒山

AST Explorer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment