Last active
November 28, 2023 16:01
-
-
Save makenowjust/2a09a2a9fd15c5a0a7a7935e5c47b73e to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# The MIT License | |
# | |
# Copyright 2023 Hiroya Fujinami (a.k.a. TSUYUSATO "MakeNowJust" Kitsune) | |
# | |
# Permission is hereby granted, free of charge, to any person obtaining a copy | |
# of this software and associated documentation files (the “Software”), to deal | |
# in the Software without restriction, including without limitation the rights | |
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | |
# copies of the Software, and to permit persons to whom the Software is | |
# furnished to do so, subject to the following conditions: | |
# | |
# The above copyright notice and this permission notice shall be included in | |
# all copies or substantial portions of the Software. | |
# | |
# THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | |
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | |
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | |
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | |
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | |
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | |
# SOFTWARE. | |
# `PrecParser` is an implementation of operator-precedence parsing. | |
# The operator-precedence parsing is also known as Pratt parsing. | |
# This implementation is extended for mixfix operators which are operators | |
# consists of multi parts (e.g. `... ? ... : ...` operator). | |
# | |
# ## Usage | |
# | |
# `PrecParser::Grammar` is a grammar of the target language and also behaves | |
# its parser. | |
# | |
# We have a `PrecParser::DSL` to easily build a `PrecParser::Grammar` instance. | |
# The below is an example grammar for simple arithmetic expressions. | |
# | |
# ``` | |
# grammar = PrecParser::DSL.build do | |
# prec :main => :add_sub | |
# | |
# prec :add_sub => :mul_div do | |
# left_assoc '+' do |left, op_tok, right| | |
# [:add, left, right] | |
# end | |
# left_assoc '-' do |left, op_tok, right| | |
# [:sub, left, right] | |
# end | |
# end | |
# | |
# prec :mul_div => :unary do | |
# left_assoc '*' do |left, op_tok, right| | |
# [:mul, left, right] | |
# end | |
# left_assoc '/' do |left, op_tok, right| | |
# [:div, left, right] | |
# end | |
# end | |
# | |
# prec :unary => :atom do | |
# prefix '+' do |op_tok, expr| | |
# [:plus, expr] | |
# end | |
# prefix '-' do |op_tok, expr| | |
# [:minus, expr] | |
# end | |
# end | |
# | |
# prec :atom do | |
# closed 'nat' do |nat_tok| | |
# [:nat, nat_tok.value] | |
# end | |
# closed '(', :add_sub, ')' do |lpar_tok, expr, rpar_tok| | |
# [:paren, expr] | |
# end | |
# end | |
# end | |
# ``` | |
# | |
# Here, `prec` is the only available method in the `PrecParser::DSL` context. | |
# This method is for defining a precedence and takes the argument of one of the | |
# following forms: `name => succs`, `name => succs`, and `name`. | |
# `prec name => succ` is an alias for `prec name => [succ]` and `prec name` is | |
# an alias for `prec name => []`. `succs` is an array of precedence names which | |
# have higher binding powers than this precedence's. Precedence names must be | |
# symbols. | |
# | |
# `prec` also can take a block to define operators belonging to this | |
# precedence. Operators can be defined by the following eight methods: | |
# | |
# - `prefix key, *parts, &block` | |
# - `postfix key, *parts, &block` | |
# - `non_assoc_prefix key, *parts, &block` | |
# - `non_assoc_postfix key, *parts, &block` | |
# - `closed key, *parts, &block` | |
# - `left_assoc key, *parts, &block` | |
# - `right_assoc key, *parts, &block` | |
# - `non_assoc key, *parts, &block` | |
# | |
# They define operators with their own name associativity. The first argument | |
# `key` specifies the key token that is used to determine which the operator to | |
# be parsed. The rest arguments are `parts`: they consist of precedence names | |
# or tokens, meaning that they are required after the key token. They can take | |
# a block, which is an action called on parsed. | |
# | |
# To parse a string using this grammar, we can use the `PrecParser::Grammar#parse` | |
# method. This method takes a `PrecParser::Stream` object. `PrecParser::Stream` | |
# is a token stream (an array) with convenient methods for parsing. It can be | |
# created easily by using `PrecParser::ArrayStream` class. | |
# | |
# ``` | |
# # A stream for the source `"1 + 2 * 3"`. | |
# stream = PrecParser::ArrayStream.new([ | |
# PrecParser::Token.new('nat', 1), | |
# PrecParser::Token.new('+'), | |
# PrecParser::Token.new('nat', 2), | |
# PrecParser::Token.new('*'), | |
# PrecParser::Token.new('nat', 3), | |
# ]) | |
# | |
# grammar.parse(stream) | |
# # => [:add, [:nat, 1], [:mul, [:nat, 2], [:nat, 3]]] | |
# ``` | |
# | |
# ## Limitations | |
# | |
# This implementation assumes that an operator can be determined by the current | |
# token on parsing. If a grammar is not so, this reports the "Conflict with the | |
# key token ..." error. | |
# | |
# For example, the following grammar is rejected because the key token `'['` | |
# is conflicted. | |
# | |
# ``` | |
# grammar = PrecParser::DSL.build do | |
# prec :range do | |
# closed '[', :int, ',', :int, ']' | |
# closed '[', :int, ',', :int, ')' | |
# end | |
# end | |
# # raises "Conflict with the key token '['" | |
# ``` | |
# | |
# ## References | |
# | |
# - [Operator-precedence parser - Wikipedia](https://en.wikipedia.org/wiki/Operator-precedence_parser) | |
# - [Parsing Mixfix Operators | Springer Link](https://link.springer.com/chapter/10.1007/978-3-642-24452-0_5) | |
module PrecParser | |
# `Grammar` is a grammar defined by operator precedences. | |
class Grammar | |
# @param main [Symbol] | |
# @param precs [Hash{Symbol => PrecParser::Prec}] | |
def initialize(main, precs) | |
@main = main | |
@precs = precs | |
end | |
# @param name [Symbol] | |
# @return [PrecParser::Prec] | |
def [](name) | |
prec = @precs[name] | |
raise "A prec '#{name}' is undefined" unless prec | |
prec | |
end | |
# @api private | |
# @param names [Array<Symbol>] | |
# @return [void] | |
def setup(names = [@main]) | |
names.each do |name| | |
prec = self[name] | |
prec.setup(self) | |
end | |
end | |
# @param stream [PrecParser::Stream] | |
# @param check_eof [Boolean] | |
# @return [Object] | |
def parse(stream, check_eof: true) | |
result = parse_prec(@main, stream) | |
stream.unexpected unless result | |
if check_eof | |
stream.expected(['EOF']) | |
stream.unexpected unless stream.eof? | |
end | |
result | |
end | |
# @api private | |
# @param names [Array<Symbol>] | |
# @param stream [PrecParser::Stream] | |
# @return [Object] | |
def parse_precs(names, stream) | |
names.each do |name| | |
result = parse_prec(name, stream) | |
return result if result | |
end | |
stream.unexpected | |
end | |
# @api private | |
# @param name [Symbol] | |
# @param stream [PrecParser::Stream] | |
# @return [Object, nil] | |
def parse_prec(name, stream) | |
prec = self[name] | |
prec.parse(self, stream) | |
end | |
# @api private | |
# @param parts [Array<String, Symbol>] | |
# @param stream [PrecParser::Stream] | |
# @return [Array<PrecParser::Token, Object>] | |
def parse_parts(parts, stream) | |
parts.map do |part| | |
if part.is_a?(Symbol) | |
result = parse_prec(part, stream) | |
stream.unexpected unless result | |
result | |
else | |
token = stream.current | |
unless token.type == part | |
stream.expected([part]) | |
stream.unexpected | |
end | |
stream.next | |
token | |
end | |
end | |
end | |
def inspect | |
result = "DSL.build #{@main.inspect} do\n" | |
@precs.each do |(name, prec)| | |
prec.inspect.lines(chomp: true) do |line| | |
result << " #{line}\n" | |
end | |
end | |
result << "end" | |
result | |
end | |
end | |
# `Prec` is an operator precedence. | |
class Prec | |
# @param name [Symbol] | |
# @param succs [Array<Symbol>] | |
# @param ops [Array<Op>] | |
def initialize(name, succs, ops) | |
@name = name | |
@succs = succs | |
@ops = ops | |
end | |
# @return [Symbol] | |
attr_reader :name | |
# @return [Array<Symbol>] | |
attr_reader :succs | |
# @return [Array<PrecParser::Op>] | |
attr_reader :ops | |
# @api private | |
# @return [Hash{String => PrecParser::Op}] | |
attr_reader :prefix_table | |
# @api private | |
# @return [Hash{String => PrecParser::Op}] | |
attr_reader :postfix_table | |
# @api private | |
# @param grammar [PrecParser::Grammar] | |
# @return [void] | |
def setup(grammar) | |
return unless @prefix_table.nil? && @postfix_table.nil? | |
@prefix_table = {} | |
@postfix_table = {} | |
@ops.each do |op| | |
case op.type | |
when :prefix, :non_assoc_prefix, :closed | |
if @prefix_table.include?(op.key) | |
raise "Conflict with the key token '#{op.key}'" | |
end | |
@prefix_table[op.key] = op | |
when :postfix, :non_assoc_postfix, :left_assoc, :right_assoc, :non_assoc | |
if @postfix_table.include?(op.key) | |
raise "Conflict with the key token '#{op.key}'" | |
end | |
@postfix_table[op.key] = op | |
end | |
end | |
grammar.setup(@succs) | |
@ops.each do |op| | |
grammar.setup(op.parts.filter { _1.is_a?(Symbol) }) | |
end | |
@all_prefix_keys = @prefix_table.keys | |
@all_postfix_keys = @postfix_table.keys | |
@prefix_keys = @prefix_table.filter { |key, op| op.type == :prefix }.keys | |
@right_assoc_keys = @postfix_table.filter { |key, op| op.type == :right_assoc }.keys | |
@postfix_and_left_assoc_keys = @postfix_table.filter { |key, op| op.type == :postfix || op.type == :left_assoc }.keys | |
end | |
# @api private | |
# @param grammar [PrecParser::Grammar] | |
# @param stream [PrecParser::Stream] | |
# @return [Object, nil] | |
def parse(grammar, stream) | |
op = @prefix_table[stream.current&.type] | |
if op | |
key_token = stream.current | |
stream.next | |
case op.type | |
when :prefix | |
parts = grammar.parse_parts(op.parts, stream) | |
stack = [[op, key_token, *parts]] | |
return parse_prefix_and_right_assoc(grammar, stream, stack) | |
when :non_assoc_prefix | |
parts = grammar.parse_parts(op.parts, stream) | |
node = grammar.parse_precs(@succs) | |
return op.build(key_token, *parts, node) | |
when :closed | |
parts = grammar.parse_parts(op.parts, stream) | |
return op.build(key_token, *parts) | |
else raise 'Unreachable' | |
end | |
end | |
stream.expected(@all_prefix_keys) | |
node = grammar.parse_precs(@succs, stream) | |
key_token = stream.current | |
op = @postfix_table[key_token&.type] | |
unless op | |
stream.expected(@all_postfix_keys) | |
return node | |
end | |
stream.next | |
case op.type | |
when :postfix | |
parts = grammar.parse_parts(op.parts, stream) | |
node = op.build(node, key_token, *parts) | |
return parse_postfix_and_left_assoc(grammar, stream, node) | |
when :non_assoc_postfix | |
parts = grammar.parse_parts(op.parts, stream) | |
return op.build(node, key_token, *parts) | |
when :left_assoc | |
parts = grammar.parse_parts(op.parts, stream) | |
right = grammar.parse_precs(@succs, stream) | |
node = op.build(node, key_token, *parts, right) | |
return parse_postfix_and_left_assoc(grammar, stream, node) | |
when :right_assoc | |
parts = grammar.parse_parts(op.parts, stream) | |
stack = [[op, node, key_token, *parts]] | |
return parse_prefix_and_right_assoc(grammar, stream, stack) | |
when :non_assoc | |
parts = grammar.parse_parts(op.parts, stream) | |
right = grammar.parse_precs(@succs, stream) | |
return op.build(node, key_token, *parts, right) | |
else raise 'Unreachable' | |
end | |
end | |
# @api private | |
# @param grammar [PrecParser::Grammar] | |
# @param stream [PrecParser::Stream] | |
# @param stack [Array<Array<PrecParser::Op, PrecParser::Token, Object>>] | |
# @return [Object] | |
def parse_prefix_and_right_assoc(grammar, stream, stack) | |
continue = true | |
node = nil | |
while continue do | |
key_token = stream.current | |
op = @prefix_table[key_token&.type] | |
while op.is_a?(Op) && op.type == :prefix | |
stream.next | |
parts = grammar.parse_parts(op.parts, stream) | |
stack << [op, key_token, *parts] | |
key_token = stream.current | |
op = @prefix_table[key_token.type] | |
end | |
stream.expected(@prefix_keys) | |
node = grammar.parse_precs(@succs, stream) | |
key_token = stream.current | |
op = @postfix_table[key_token&.type] | |
if op.is_a?(Op) && op.type == :right_assoc | |
stream.next | |
parts = grammar.parse_parts(op.parts, stream) | |
stack << [op, node, key_token, *parts] | |
continue = true | |
else | |
stream.expected(@right_assoc_keys) | |
continue = false | |
end | |
end | |
stack.reverse_each do |(op, *args)| | |
node = op.build(*args, node) | |
end | |
node | |
end | |
# @api private | |
# @param grammar [PrecParser::Grammar] | |
# @param stream [PrecParser::Stream] | |
# @param node [Object] | |
# @return [Object] | |
def parse_postfix_and_left_assoc(grammar, stream, node) | |
key_token = stream.current | |
op = @postfix_table[key_token&.type] | |
while op.is_a?(Op) && (op.type == :postfix || op.type == :left_assoc) | |
stream.next | |
case op.type | |
when :postfix | |
parts = grammar.parse_parts(op.parts, stream) | |
node = op.build(node, key_token, *parts) | |
when :left_assoc | |
parts = grammar.parse_parts(op.parts, stream) | |
right = grammar.parse_precs(@succs, stream) | |
node = op.build(node, key_token, *parts, right) | |
end | |
key_token = stream.current | |
op = @postfix_table[key_token.type] | |
end | |
stream.expected(@postfix_and_left_assoc_keys) | |
node | |
end | |
def inspect | |
result = "prec #{name.inspect}" | |
unless succs.empty? | |
result << " => #{succs.inspect}" | |
end | |
unless ops.empty? | |
result << " do\n" | |
ops.each do |op| | |
result << " #{op.inspect}\n" | |
end | |
result << "end" | |
end | |
result | |
end | |
end | |
# `Op` is an operator. | |
class Op | |
# @param type [:prefix, :postfix, :non_assoc_prefix, :non_assoc_postfix, :closed] | |
# @param type [:left_assoc, :right_assoc, :non_assoc] | |
# @param key [String] | |
# @param parts [Array<String, Symbol>] | |
# @param build [Proc] | |
def initialize(type, key, parts, build) | |
@type = type | |
@key = key | |
@parts = parts | |
@build = build | |
end | |
# @return [:prefix, :postfix, :non_assoc_prefix, :non_assoc_postfix, :closed] | |
# @return [:left_assoc, :right_assoc, :non_assoc] | |
attr_reader :type | |
# @return [String] | |
attr_reader :key | |
# @return [Array<String, Symbol>] | |
attr_reader :parts | |
# @param args [Array<PrecParser::Token, Object>] | |
# @return [Object] | |
def build(*args) | |
@build.call(*args) | |
end | |
def inspect | |
"#{type} #{([key] + parts).map(&:inspect).join(', ')}" | |
end | |
end | |
# `Stream` is a token stream. | |
# | |
# @abstract `current`, `next` and `eof?` must be implemented. | |
class Stream | |
def initialize | |
@expected = [] | |
end | |
# @return [void] | |
def next | |
@expected = [] | |
end | |
# @param expected [Array<String>] | |
# @return [void] | |
def expected(expected) | |
@expected += expected | |
end | |
# @return [void] | |
def unexpected | |
token = current&.type || 'EOF' | |
if @expected.empty? | |
raise "Unexpected token '#{token}'" | |
else | |
raise "Expected token(s) #{@expected.sort.map { |tok| "'#{tok}'" }.join(", ")}, but the unexpected token '#{token}' comes" | |
end | |
end | |
end | |
# `ArrayStream` is a simple implementation of `PrecParser::Stream` with an array. | |
class ArrayStream < Stream | |
# @param tokens [Array<PrecParser::Token>] | |
def initialize(tokens) | |
super() | |
@tokens = tokens | |
@index = 0 | |
end | |
# @return [PrecParser::Token, nil] | |
def current | |
@tokens[@index] | |
end | |
# @return [void] | |
def next | |
super | |
@index += 1 | |
@tokens[@index] | |
end | |
# @return [Boolean] | |
def eof? | |
@index >= @tokens.size | |
end | |
end | |
# `Token` is a token. | |
class Token | |
# @param type [String] | |
# @param value [Object] | |
# @param loc [Range, nil] | |
def initialize(type, value = nil, loc: nil) | |
@type = type | |
@value = value | |
@loc = loc | |
end | |
# @return [String] | |
attr_reader :type | |
# @return [Object] | |
attr_reader :value | |
# @return [Range, nil] | |
attr_reader :loc | |
end | |
# `RegexpLexer` is a lexer using regexp patterns. | |
class RegexpLexer | |
# @param skip [Regexp] | |
# @param pattern [Regexp] | |
# @param block [Proc] | |
def initialize(skip:, pattern:, &block) | |
@skip = /\G#{skip}/ | |
@pattern = /\G#{pattern}/ | |
@block = block | |
end | |
# @return [Regexp] | |
attr_reader :skip | |
# @return [Regexp] | |
attr_reader :pattern | |
# @return [Proc] | |
attr_reader :block | |
# @param source [String] | |
# @return [PrecParser::RegexpStream] | |
def lex(source) | |
RegexpStream.new(self, source) | |
end | |
end | |
# @api private | |
class RegexpStream < Stream | |
# @param lexer [PrecParser::Lexer] | |
# @param source [String] | |
def initialize(lexer, source) | |
super() | |
@lexer = lexer | |
@source = source | |
@offset = 0 | |
@current = nil | |
self.next | |
end | |
# @return [PrecParser::Token, nil] | |
def current | |
@current | |
end | |
# @return [void] | |
def next | |
super | |
skip = @source.match(@lexer.skip, @offset) | |
@offset = skip.end(0) if skip | |
return nil if eof? | |
match = @source.match(@lexer.pattern, @offset) | |
unless match | |
raise "Unexpected character: #{@source[@offset].inspect}" | |
end | |
type, value = @lexer.block.call(match) | |
loc = match.begin(0)...match.end(0) | |
@offset = match.end(0) | |
@current = Token.new(type, value, loc:) | |
end | |
# @return [Boolean] | |
def eof? | |
@source.size <= @offset | |
end | |
end | |
# `DSL` provides a DSL to construct `PrecParser::Grammar` objects. | |
# It is also a context of blocks of the `PrecParser::DSL.build` method. | |
class DSL | |
# @param main [Symbol] | |
# @param block [Proc] | |
# @return [PrecParser::Grammar] | |
def self.build(main = :main, &block) | |
context = DSL.new | |
context.instance_eval(&block) | |
grammar = Grammar.new(main, context.precs) | |
grammar.setup | |
grammar | |
end | |
# @api private | |
def initialize | |
@precs = {} | |
end | |
# @api private | |
attr_reader :precs | |
# @param name [Symbol, Hash{Symbol => Symbol}, Hash{Symbol => Array<Symbol>}] | |
# @return [void] | |
def prec(name, &block) | |
name, succs = | |
if name.is_a?(Hash) | |
if name.size != 1 | |
raise '`prec` should be `prec name => succs` form' | |
end | |
name, succs = name.first | |
[name, succs.is_a?(Array) ? succs : [succs]] | |
else | |
[name, []] | |
end | |
if @precs.include?(name) | |
raise "A prec '#{name}' is already defined" | |
end | |
@precs[name] = PrecDSL.build(name, succs, &block) | |
end | |
# `PrecDSL` is a context of blocks of the `PrecParser::DSL#prec` method. | |
class PrecDSL | |
# @api private | |
# @param name [Symbol] | |
# @param succs [Array<Symbol>] | |
# @param block [Proc] | |
# @return [PrecParser::] | |
def self.build(name, succs, &block) | |
context = PrecDSL.new | |
context.instance_eval(&block) if block | |
Prec.new(name, succs, context.ops) | |
end | |
# @api private | |
def initialize | |
@ops = [] | |
end | |
# @api private | |
# @return [Array<PrecParser::Op>] | |
attr_reader :ops | |
# @param key [String] | |
# @param parts [Array<String, Symbol>] | |
# @param build [Proc] | |
# @return [void] | |
def prefix(key, *parts, &build) | |
@ops << Op.new(:prefix, key, parts, build) | |
end | |
# @param key [String] | |
# @param parts [Array<String, Symbol>] | |
# @param build [Proc] | |
# @return [void] | |
def postfix(key, *parts, &build) | |
@ops << Op.new(:postfix, key, parts, build) | |
end | |
# @param key [String] | |
# @param parts [Array<String, Symbol>] | |
# @param build [Proc] | |
# @return [void] | |
def non_assoc_prefix(key, *parts, &build) | |
@ops << Op.new(:non_assoc_prefix, key, parts, build) | |
end | |
# @param key [String] | |
# @param parts [Array<String, Symbol>] | |
# @param build [Proc] | |
# @return [void] | |
def non_assoc_postfix(key, *parts, &build) | |
@ops << Op.new(:non_assoc_postfix, key, parts, build) | |
end | |
# @param key [String] | |
# @param parts [Array<String, Symbol>] | |
# @param build [Proc] | |
# @return [void] | |
def closed(key, *parts, &build) | |
@ops << Op.new(:closed, key, parts, build) | |
end | |
# @param key [String] | |
# @param parts [Array<String, Symbol>] | |
# @param build [Proc] | |
# @return [void] | |
def left_assoc(key, *parts, &build) | |
@ops << Op.new(:left_assoc, key, parts, build) | |
end | |
# @param key [String] | |
# @param parts [Array<String, Symbol>] | |
# @param build [Proc] | |
# @return [void] | |
def right_assoc(key, *parts, &build) | |
@ops << Op.new(:right_assoc, key, parts, build) | |
end | |
# @param key [String] | |
# @param parts [Array<String, Symbol>] | |
# @param build [Proc] | |
# @return [void] | |
def non_assoc(key, *parts, &build) | |
@ops << Op.new(:non_assoc, key, parts, build) | |
end | |
end | |
end | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment