Let us understand programming as converting strings to another machine-readable or executable output with a program according to a specification (called compiler). Now let the sole layout as graphical positioning be defined by additional white spaces and newlines.
Source code formatting ensures stylistic consistency to improve readability and eases on refactoring and writing code. All formatters that enable custom user data come with an escape hatch for visualization:
// zig fmt: off
const matrix = {{1, 1, 1},
{2, 2, 2},
{3, 3, 3}};
// zig fmt: on
is more readable than
const matrix = {{1, 1, 1}, {2, 2, 2}, {3, 3, 3}};
However, the user expects an input without zig fmt: off
and on
lines to
convert the input
const matrix = {{1, 1, 1},
{2, 2, 2},
{3, 3, 3}};
on running zig fmt
to
const matrix = {{1, 1, 1}, {2, 2, 2}, {3, 3, 3}};
There is no semantic reason why an editor should not be able to do the same
transformation, if it can store the Abstract Syntax Tree of the input with
the additional offsets of line 2 and 3 different from "where the next tokens
are expected".
For example, const matrix = {{1, 1, 1},
, {2, 2, 2},
and {3, 3, 3}};
have matching layout.
In between, are either 1 newline with white spaces or only 1 white space.
More complex is utilizing optional brackets, as they require a lookahead to count number of statements between {}-brackets.
if (condition) runFunction();
if (condition)
runFunction();
// alternative
if (condition) {
runFunction();
}
Zig specifically has one design aspect, which makes it very nice to use, but limits formatting performance (in theory):
const matrix = {{1, 1, 1}, {2, 2, 2}, {3, 3, 3},};
is formatted to
const matrix = {{1, 1, 1},
{2, 2, 2},
{3, 3, 3},};
Thus the formatter needs to traverse the full bracket to check existence of a comma to identify the correct layout, which can be slow for huge files.
One simple, but unergonomic to code (you can not simply append some code),
workaround would be to use a comma prefix ({,{1, 1, 1}...{3, 3, 3}}
)
instead of postfix ({{1, 1, 1}...{3, 3, 3},}
).
Summary
- Comparison against the "normalized layout" is possible, even for sections with disabled formatter.
- The difference of tokens in between can be stored and versioned.
The next article will outline consequences, if we try to store the changes efficiently. Use cases are undo files, source code version control, source code reduction and generally anything which rewrites AST and needs to take into consideration the user formatting of source code.
With that in hindsight, I decided to not follow up this idea. Without unicode support, this looks however feasible to me.