I wrote a toy parser combinator using Dart 3.
[Warning, the current Dart compiler crashes when running this!]
To enable inline classes, add these lines to your analysis_options.yaml
file:
enable-experiment:
- inline-class
A parser is a function that takes some input and returns a tuple of some result representing the structure of the parsed input together with the remaining input. A combinator is a higher-order function that takes parsers as input and returns a new parser that somehow combines their behaviors.
Here is my definition of a Parser<T>
:
typedef Parser<T> = (Result<T>, Input) Function(Input);
Let's represent the Result<T>
as a sealed class cluster:
sealed class Result<T> {}
final class Success<T> extends Result<T> {
Success(this.value);
final T value;
}
final class Failure<T> extends Result<T> {
Failure(this.error);
final String error;
}
sealed
is a new keyword for abstract
classes where all subclasses are known at package level. Also, a final
class is a class that must not be subclassed. This way, we can switch
over all alternatives.
I want to use an inline
class for Input
in order to distinguish it from ordinary strings and to gain some additional type safety:
inline class Input {
Input(this.source);
final String source;
}
The most trivial parser is one that always succeeds:
Parser<T> succeed<T>(T value) {
return (input) => (Success(value), input);
}
Another trivial parser is one that always fails:
Parser<T> fail<T>(String error) {
return (input) => (Failure(error), input);
}
Here's a parser that matches and consumes a given string if it is at the beginning of the input:
Parser<void> consume(String s) {
return (input) {
if (input.startsWith(s)) {
return (Success(null), input.consume(s.length));
}
return (Failure('expected $s'), input);
};
}
To make this work, we add two methods to Input
:
inline class Input {
...
bool startsWith(Pattern pattern) => source.startsWith(pattern);
Input consume(int length) => Input(source.substring(length));
}
Here's a parser that accepts foo
:
final fooParser = consume('foo');
To use this parser on some input, we add an extension to Parser<T>
:
extension ParserExtensions<T> on Parser<T> {
Result<T> parse(String source) {
final (result, _) = this(Input(source));
return result;
}
}
This should return a Success
:
fooParser.parse('foobar');
This should return a Failure('expected foo')
:
fooParser.parse('baz');
To combine parsers, let's assume we have defined parsers that accept the 10 digits 0
to 9
. The oneOf
combinator will then create a parser that accepts any one digit. For reasons I don't understand, Dart fails to infer oneOf
's type. If you first create that array of parsers as a variable and then initialize oneOf
with that variable, it works as expected. Bummer.
final zeroParser = consume('0');
final oneParser = consume('1');
final digitParser = oneOf<void>([zeroParser, oneParser]);
Here's the oneOf
implementation, which is a bit more evolved than simply trying out all parsers because I want to collect all error messages and combine them into a single error message.
Parser<T> oneOf<T>(List<Parser<T>> parsers) {
return (input) {
final errors = <String>[];
for (final parser in parsers) {
final (result, next) = parser(input);
if (result is Success<T>) {
return (result, next);
} else if (result is Failure<T>) {
errors.add(result.error);
}
}
return (Failure(errors.join(' or ')), input);
};
}
BTW, either I didn't understand sealed
correctly or that feature isn't implemented yet. I thought that because the compiler knows that Success
and Failure
are the only subclasses of Result
, a simple else
would have been enough. Let's use switch
instead:
Parser<T> oneOf<T>(List<Parser<T>> parsers) {
return (input) {
final errors = <String>[];
for (final parser in parsers) {
switch (parser(input)) {
case (:Result<T> result, :Input next):
return (result, next);
case (Failure(:var error), _):
errors.add(error);
}
}
return (Failure(errors.join(' or ')), input);
};
}
This should return a Failure('expected 0 or expected 1')
:
digitParser.parse('aaa');
And this should work:
digitParser.parse('101');
To parse many digits at once, a many
combinator is useful:
Parser<List<T>> many<T>(Parser<T> parser) {
return (input) {
final results = <T>[];
while (true) {
switch (parser(input)) {
case (Success(:T value), :Input next):
results.add(value);
input = next;
default:
return (Success(results), input);
}
}
};
}
Then it's possible to create a parser for zero or more digits like so:
final digitsParser = many(digitParser);
digitsParser.parse('1001');
Because consume returns void, I do not get a value. This is by design. Let's add a map
method, that transforms its receiver into another parser that will transform a successful result to something more meaningful.
extension ParserExtensions<T> on Parser<T> {
...
Parser<U> map<U>(U Function(T) transform) {
return (input) => switch (this(input)) {
(Success<T>(:var value), :Input next) => (Success(transform(value)), next),
(Failure<T>(:var error), :Input next) => (Failure<U>(error), next),
};
}
Parser<U> value<U>(U value) => map((_) => value);
}
I've also added a trivial application of map
called value
which ignores the given value and returns a given constant.
This is handy to provide values to all digit parsers:
final zeroParser = consume('0').value(0);
final oneParser = consume('1').value(1);
final digitParser = oneOf<int>([zeroParser, oneParser]);
The digits parser would now return a list of zeros and ones (as I omitted all other digits as before, you can add them). I can reduce that list to a single value:
final digitsParser = many(digitParser).map((values) => //
values.fold(0, (value, digit) => value * 2 + digit));
This should print 9:
print(digitsParser.parse('1001'));
Right now, zero digits are a valid result. Here's a many1
combinator that will call a given parser at least once. Instead of implementing the anonymous parser function myself, I will combine it from more primitive parsers, using a new andThen
parser (which is sometimes called flatMap
):
Parser<List<T>> many1<T>(Parser<T> parser) {
return parser.andThen((result) => switch (result) {
Success<T>(:var value) => many(parser).map((values) => [value] + values),
Failure<T>(:var error) => fail(error),
});
}
Again, I struggle with the switch exhaustiveness.
Here is the implementation of andThen
(aka flatMap
):
extension ParserExtensions<T> on Parser<T> {
Parser<U> andThen<U>(Parser<U> Function(Result<T> result) continuation) {
return (input) {
final (result, next) = this(input);
return continuation(result)(next);
};
}
}
Most if not all combinators could be implemented using this function, BTW, but lets not go there. Instead, I will just demonstrate sequencing parsers.
Unfortunately, I cannot call this parser &
, because operator
declarations don't support generic types in Dart. So and
it is, and it will take another parser and apply both sequentially, returning a tuple of the result values.
extension ParserExtensions<T> on Parser<T> {
...
Parser<(T, U)> and<U>(Parser<U> parser) {
return andThen((result) => switch (result) {
Success<T>(:var value) => parser.map((value2) => (value, value2)),
Failure<T>(:var error) => fail(error),
});
}
}
Often, I don't need both values. Therefore, I create three more parsers which return only the first value, only the second value or no value at all:
extension ParserExtensions<T> on Parser<T> {
...
Parser<T> and1<U>(Parser<U> parser) => and(parser).map((tuple) => tuple.$1);
Parser<U> and2<U>(Parser<U> parser) => and(parser).map((tuple) => tuple.$2);
Parser<void> and0<U>(Parser<U> parser) => and(parser).value(null);
}
Also, let's define an "or" parser, this time as an |
operator, because I can:
extension ParserExtensions<T> on Parser<T> {
...
Parser<T> operator |(Parser<T> parser) {
return andThen((result) => switch (result) {
Success<T>(:var value) => succeed(value),
Failure<T>() => parser,
});
}
}
I can now parse a negative number like so:
final positive = many1(digitParser);
final negative = consume('-').and2(positive).map((v) => -v);
final integer = positive | negative;
Parsing a comma separated list of integers? Here you go:
Parser<List<T>> Function(Parser<T> p) sepBy1<T>(Parser<dynamic> sep) {
return (p) => p.and(many(sep.and2(p))).map((t) => [t.$1] + t.$2);
}
final comma = consume(',');
final listOfInt = sepBy1<int>(comma)(integer);
(Dart fails to infer the parser's type, it really needs to become smarter)
Parsing that list within parenthesis? This time I will uncurry the parser:
Parser<T> between<T>(Parser begin, Parser end, Parser<T> p) {
return begin.and2(p.and1(end));
}
final parenthesizedList = between(consume('('), consume(')'), listOfInt);
Once the foundation is laid down, increasingly more complex parsers become very easy to construct. And Dart's new tuple and pattern matching features make writing code quite elegant.
Let's go one step back and create a consumeIf
parser so it's easier to parse different digits without the need to combine trivial digit parsers:
Parser<void> consumeIf(bool Function(int ch) test) {
return (input) {
return switch (input.consumeIf(test)) {
Input next => (Success(null), next),
_ => (Failure('no match'), input),
};
};
}
We need this new method in Input
:
inline class Input {
...
Input? consumeIf(bool Function(int ch) test) {
return (source.isNotEmpty && test(source.codeUnitAt(0))) ? consume(1) : null;
}
}
Let's assume there is a one-to-one mapping between characters (grapheme clusters) and code points (which is wrong), we could create a consumeDigit
-Parser like so:
final consumeDigit = consumeIf((ch) => ch >= 48 && ch < 58);
We could also create primitive parsers like consumeWhile
or consumeSome
or consumeRange
or whatever you like as foundation building block. If you care for speed, allowing regular expressions would also be an option.
Here's a typical definition of an identifier that starts with a (ASCII) letter which is followed by (ASCII) letters or digits.
final consumeLetter = consumeIf((ch) => ch >= 65 && ch < 91 || ch >= 97 && ch < 123);
final consumeIdentifier = consumeLetter.and0(many(consumeLetter | consumeDigit));
To get parsed string, we use a get
parser like so:
extension ParserExtensions<T> on Parser<T> {
...
Parser<String> get get {
return (input) => switch (this(input)) {
(Success<T>(), :Input next) => (Success(input.without(next)), next),
(Failure<T>(:var error), :Input next) => (Failure(error), next),
_ => throw Error(), // why?!
};
}
}
And then we can use get
to create an identifer
parser:
final identifier = consumeIdentifier.get;
Here's how to parse a floating point value:
final nothing = succeed<void>(null);
final number = digits.and0(consume('.').and0(digits) | nothing).get.map(double.parse);
Alternatively, define an opt
parser for parsing something or nothing.
Parser<T?> opt<T>(Parser<T> p) => p.map<T?>((v) => v) | succeed(null);
final number2 = opt(consume('-')).and(digits.and(opt(consume('.').and(digits)))).get.map(double.parse);
Also notice, that double.parse
might throw an exception and because no parser does so, we should implement a slightly more sophisticated map
method that catches exceptions and returns them as Failure
.
However, implementing this feature is left as an exercise for the reader.
Seems like petitparser is gonna get a whole lot simpler. :)