Skip to content

Instantly share code, notes, and snippets.

@makenowjust
Last active November 28, 2023 16:01
Show Gist options
  • Save makenowjust/2a09a2a9fd15c5a0a7a7935e5c47b73e to your computer and use it in GitHub Desktop.
Save makenowjust/2a09a2a9fd15c5a0a7a7935e5c47b73e to your computer and use it in GitHub Desktop.
# The MIT License
#
# Copyright 2023 Hiroya Fujinami (a.k.a. TSUYUSATO "MakeNowJust" Kitsune)
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the “Software”), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
# `PrecParser` is an implementation of operator-precedence parsing.
# The operator-precedence parsing is also known as Pratt parsing.
# This implementation is extended for mixfix operators which are operators
# consists of multi parts (e.g. `... ? ... : ...` operator).
#
# ## Usage
#
# `PrecParser::Grammar` is a grammar of the target language and also behaves
# its parser.
#
# We have a `PrecParser::DSL` to easily build a `PrecParser::Grammar` instance.
# The below is an example grammar for simple arithmetic expressions.
#
# ```
# grammar = PrecParser::DSL.build do
# prec :main => :add_sub
#
# prec :add_sub => :mul_div do
# left_assoc '+' do |left, op_tok, right|
# [:add, left, right]
# end
# left_assoc '-' do |left, op_tok, right|
# [:sub, left, right]
# end
# end
#
# prec :mul_div => :unary do
# left_assoc '*' do |left, op_tok, right|
# [:mul, left, right]
# end
# left_assoc '/' do |left, op_tok, right|
# [:div, left, right]
# end
# end
#
# prec :unary => :atom do
# prefix '+' do |op_tok, expr|
# [:plus, expr]
# end
# prefix '-' do |op_tok, expr|
# [:minus, expr]
# end
# end
#
# prec :atom do
# closed 'nat' do |nat_tok|
# [:nat, nat_tok.value]
# end
# closed '(', :add_sub, ')' do |lpar_tok, expr, rpar_tok|
# [:paren, expr]
# end
# end
# end
# ```
#
# Here, `prec` is the only available method in the `PrecParser::DSL` context.
# This method is for defining a precedence and takes the argument of one of the
# following forms: `name => succs`, `name => succs`, and `name`.
# `prec name => succ` is an alias for `prec name => [succ]` and `prec name` is
# an alias for `prec name => []`. `succs` is an array of precedence names which
# have higher binding powers than this precedence's. Precedence names must be
# symbols.
#
# `prec` also can take a block to define operators belonging to this
# precedence. Operators can be defined by the following eight methods:
#
# - `prefix key, *parts, &block`
# - `postfix key, *parts, &block`
# - `non_assoc_prefix key, *parts, &block`
# - `non_assoc_postfix key, *parts, &block`
# - `closed key, *parts, &block`
# - `left_assoc key, *parts, &block`
# - `right_assoc key, *parts, &block`
# - `non_assoc key, *parts, &block`
#
# They define operators with their own name associativity. The first argument
# `key` specifies the key token that is used to determine which the operator to
# be parsed. The rest arguments are `parts`: they consist of precedence names
# or tokens, meaning that they are required after the key token. They can take
# a block, which is an action called on parsed.
#
# To parse a string using this grammar, we can use the `PrecParser::Grammar#parse`
# method. This method takes a `PrecParser::Stream` object. `PrecParser::Stream`
# is a token stream (an array) with convenient methods for parsing. It can be
# created easily by using `PrecParser::ArrayStream` class.
#
# ```
# # A stream for the source `"1 + 2 * 3"`.
# stream = PrecParser::ArrayStream.new([
# PrecParser::Token.new('nat', 1),
# PrecParser::Token.new('+'),
# PrecParser::Token.new('nat', 2),
# PrecParser::Token.new('*'),
# PrecParser::Token.new('nat', 3),
# ])
#
# grammar.parse(stream)
# # => [:add, [:nat, 1], [:mul, [:nat, 2], [:nat, 3]]]
# ```
#
# ## Limitations
#
# This implementation assumes that an operator can be determined by the current
# token on parsing. If a grammar is not so, this reports the "Conflict with the
# key token ..." error.
#
# For example, the following grammar is rejected because the key token `'['`
# is conflicted.
#
# ```
# grammar = PrecParser::DSL.build do
# prec :range do
# closed '[', :int, ',', :int, ']'
# closed '[', :int, ',', :int, ')'
# end
# end
# # raises "Conflict with the key token '['"
# ```
#
# ## References
#
# - [Operator-precedence parser - Wikipedia](https://en.wikipedia.org/wiki/Operator-precedence_parser)
# - [Parsing Mixfix Operators | Springer Link](https://link.springer.com/chapter/10.1007/978-3-642-24452-0_5)
module PrecParser
# `Grammar` is a grammar defined by operator precedences.
class Grammar
# @param main [Symbol]
# @param precs [Hash{Symbol => PrecParser::Prec}]
def initialize(main, precs)
@main = main
@precs = precs
end
# @param name [Symbol]
# @return [PrecParser::Prec]
def [](name)
prec = @precs[name]
raise "A prec '#{name}' is undefined" unless prec
prec
end
# @api private
# @param names [Array<Symbol>]
# @return [void]
def setup(names = [@main])
names.each do |name|
prec = self[name]
prec.setup(self)
end
end
# @param stream [PrecParser::Stream]
# @param check_eof [Boolean]
# @return [Object]
def parse(stream, check_eof: true)
result = parse_prec(@main, stream)
stream.unexpected unless result
if check_eof
stream.expected(['EOF'])
stream.unexpected unless stream.eof?
end
result
end
# @api private
# @param names [Array<Symbol>]
# @param stream [PrecParser::Stream]
# @return [Object]
def parse_precs(names, stream)
names.each do |name|
result = parse_prec(name, stream)
return result if result
end
stream.unexpected
end
# @api private
# @param name [Symbol]
# @param stream [PrecParser::Stream]
# @return [Object, nil]
def parse_prec(name, stream)
prec = self[name]
prec.parse(self, stream)
end
# @api private
# @param parts [Array<String, Symbol>]
# @param stream [PrecParser::Stream]
# @return [Array<PrecParser::Token, Object>]
def parse_parts(parts, stream)
parts.map do |part|
if part.is_a?(Symbol)
result = parse_prec(part, stream)
stream.unexpected unless result
result
else
token = stream.current
unless token.type == part
stream.expected([part])
stream.unexpected
end
stream.next
token
end
end
end
def inspect
result = "DSL.build #{@main.inspect} do\n"
@precs.each do |(name, prec)|
prec.inspect.lines(chomp: true) do |line|
result << " #{line}\n"
end
end
result << "end"
result
end
end
# `Prec` is an operator precedence.
class Prec
# @param name [Symbol]
# @param succs [Array<Symbol>]
# @param ops [Array<Op>]
def initialize(name, succs, ops)
@name = name
@succs = succs
@ops = ops
end
# @return [Symbol]
attr_reader :name
# @return [Array<Symbol>]
attr_reader :succs
# @return [Array<PrecParser::Op>]
attr_reader :ops
# @api private
# @return [Hash{String => PrecParser::Op}]
attr_reader :prefix_table
# @api private
# @return [Hash{String => PrecParser::Op}]
attr_reader :postfix_table
# @api private
# @param grammar [PrecParser::Grammar]
# @return [void]
def setup(grammar)
return unless @prefix_table.nil? && @postfix_table.nil?
@prefix_table = {}
@postfix_table = {}
@ops.each do |op|
case op.type
when :prefix, :non_assoc_prefix, :closed
if @prefix_table.include?(op.key)
raise "Conflict with the key token '#{op.key}'"
end
@prefix_table[op.key] = op
when :postfix, :non_assoc_postfix, :left_assoc, :right_assoc, :non_assoc
if @postfix_table.include?(op.key)
raise "Conflict with the key token '#{op.key}'"
end
@postfix_table[op.key] = op
end
end
grammar.setup(@succs)
@ops.each do |op|
grammar.setup(op.parts.filter { _1.is_a?(Symbol) })
end
@all_prefix_keys = @prefix_table.keys
@all_postfix_keys = @postfix_table.keys
@prefix_keys = @prefix_table.filter { |key, op| op.type == :prefix }.keys
@right_assoc_keys = @postfix_table.filter { |key, op| op.type == :right_assoc }.keys
@postfix_and_left_assoc_keys = @postfix_table.filter { |key, op| op.type == :postfix || op.type == :left_assoc }.keys
end
# @api private
# @param grammar [PrecParser::Grammar]
# @param stream [PrecParser::Stream]
# @return [Object, nil]
def parse(grammar, stream)
op = @prefix_table[stream.current&.type]
if op
key_token = stream.current
stream.next
case op.type
when :prefix
parts = grammar.parse_parts(op.parts, stream)
stack = [[op, key_token, *parts]]
return parse_prefix_and_right_assoc(grammar, stream, stack)
when :non_assoc_prefix
parts = grammar.parse_parts(op.parts, stream)
node = grammar.parse_precs(@succs)
return op.build(key_token, *parts, node)
when :closed
parts = grammar.parse_parts(op.parts, stream)
return op.build(key_token, *parts)
else raise 'Unreachable'
end
end
stream.expected(@all_prefix_keys)
node = grammar.parse_precs(@succs, stream)
key_token = stream.current
op = @postfix_table[key_token&.type]
unless op
stream.expected(@all_postfix_keys)
return node
end
stream.next
case op.type
when :postfix
parts = grammar.parse_parts(op.parts, stream)
node = op.build(node, key_token, *parts)
return parse_postfix_and_left_assoc(grammar, stream, node)
when :non_assoc_postfix
parts = grammar.parse_parts(op.parts, stream)
return op.build(node, key_token, *parts)
when :left_assoc
parts = grammar.parse_parts(op.parts, stream)
right = grammar.parse_precs(@succs, stream)
node = op.build(node, key_token, *parts, right)
return parse_postfix_and_left_assoc(grammar, stream, node)
when :right_assoc
parts = grammar.parse_parts(op.parts, stream)
stack = [[op, node, key_token, *parts]]
return parse_prefix_and_right_assoc(grammar, stream, stack)
when :non_assoc
parts = grammar.parse_parts(op.parts, stream)
right = grammar.parse_precs(@succs, stream)
return op.build(node, key_token, *parts, right)
else raise 'Unreachable'
end
end
# @api private
# @param grammar [PrecParser::Grammar]
# @param stream [PrecParser::Stream]
# @param stack [Array<Array<PrecParser::Op, PrecParser::Token, Object>>]
# @return [Object]
def parse_prefix_and_right_assoc(grammar, stream, stack)
continue = true
node = nil
while continue do
key_token = stream.current
op = @prefix_table[key_token&.type]
while op.is_a?(Op) && op.type == :prefix
stream.next
parts = grammar.parse_parts(op.parts, stream)
stack << [op, key_token, *parts]
key_token = stream.current
op = @prefix_table[key_token.type]
end
stream.expected(@prefix_keys)
node = grammar.parse_precs(@succs, stream)
key_token = stream.current
op = @postfix_table[key_token&.type]
if op.is_a?(Op) && op.type == :right_assoc
stream.next
parts = grammar.parse_parts(op.parts, stream)
stack << [op, node, key_token, *parts]
continue = true
else
stream.expected(@right_assoc_keys)
continue = false
end
end
stack.reverse_each do |(op, *args)|
node = op.build(*args, node)
end
node
end
# @api private
# @param grammar [PrecParser::Grammar]
# @param stream [PrecParser::Stream]
# @param node [Object]
# @return [Object]
def parse_postfix_and_left_assoc(grammar, stream, node)
key_token = stream.current
op = @postfix_table[key_token&.type]
while op.is_a?(Op) && (op.type == :postfix || op.type == :left_assoc)
stream.next
case op.type
when :postfix
parts = grammar.parse_parts(op.parts, stream)
node = op.build(node, key_token, *parts)
when :left_assoc
parts = grammar.parse_parts(op.parts, stream)
right = grammar.parse_precs(@succs, stream)
node = op.build(node, key_token, *parts, right)
end
key_token = stream.current
op = @postfix_table[key_token.type]
end
stream.expected(@postfix_and_left_assoc_keys)
node
end
def inspect
result = "prec #{name.inspect}"
unless succs.empty?
result << " => #{succs.inspect}"
end
unless ops.empty?
result << " do\n"
ops.each do |op|
result << " #{op.inspect}\n"
end
result << "end"
end
result
end
end
# `Op` is an operator.
class Op
# @param type [:prefix, :postfix, :non_assoc_prefix, :non_assoc_postfix, :closed]
# @param type [:left_assoc, :right_assoc, :non_assoc]
# @param key [String]
# @param parts [Array<String, Symbol>]
# @param build [Proc]
def initialize(type, key, parts, build)
@type = type
@key = key
@parts = parts
@build = build
end
# @return [:prefix, :postfix, :non_assoc_prefix, :non_assoc_postfix, :closed]
# @return [:left_assoc, :right_assoc, :non_assoc]
attr_reader :type
# @return [String]
attr_reader :key
# @return [Array<String, Symbol>]
attr_reader :parts
# @param args [Array<PrecParser::Token, Object>]
# @return [Object]
def build(*args)
@build.call(*args)
end
def inspect
"#{type} #{([key] + parts).map(&:inspect).join(', ')}"
end
end
# `Stream` is a token stream.
#
# @abstract `current`, `next` and `eof?` must be implemented.
class Stream
def initialize
@expected = []
end
# @return [void]
def next
@expected = []
end
# @param expected [Array<String>]
# @return [void]
def expected(expected)
@expected += expected
end
# @return [void]
def unexpected
token = current&.type || 'EOF'
if @expected.empty?
raise "Unexpected token '#{token}'"
else
raise "Expected token(s) #{@expected.sort.map { |tok| "'#{tok}'" }.join(", ")}, but the unexpected token '#{token}' comes"
end
end
end
# `ArrayStream` is a simple implementation of `PrecParser::Stream` with an array.
class ArrayStream < Stream
# @param tokens [Array<PrecParser::Token>]
def initialize(tokens)
super()
@tokens = tokens
@index = 0
end
# @return [PrecParser::Token, nil]
def current
@tokens[@index]
end
# @return [void]
def next
super
@index += 1
@tokens[@index]
end
# @return [Boolean]
def eof?
@index >= @tokens.size
end
end
# `Token` is a token.
class Token
# @param type [String]
# @param value [Object]
# @param loc [Range, nil]
def initialize(type, value = nil, loc: nil)
@type = type
@value = value
@loc = loc
end
# @return [String]
attr_reader :type
# @return [Object]
attr_reader :value
# @return [Range, nil]
attr_reader :loc
end
# `RegexpLexer` is a lexer using regexp patterns.
class RegexpLexer
# @param skip [Regexp]
# @param pattern [Regexp]
# @param block [Proc]
def initialize(skip:, pattern:, &block)
@skip = /\G#{skip}/
@pattern = /\G#{pattern}/
@block = block
end
# @return [Regexp]
attr_reader :skip
# @return [Regexp]
attr_reader :pattern
# @return [Proc]
attr_reader :block
# @param source [String]
# @return [PrecParser::RegexpStream]
def lex(source)
RegexpStream.new(self, source)
end
end
# @api private
class RegexpStream < Stream
# @param lexer [PrecParser::Lexer]
# @param source [String]
def initialize(lexer, source)
super()
@lexer = lexer
@source = source
@offset = 0
@current = nil
self.next
end
# @return [PrecParser::Token, nil]
def current
@current
end
# @return [void]
def next
super
skip = @source.match(@lexer.skip, @offset)
@offset = skip.end(0) if skip
return nil if eof?
match = @source.match(@lexer.pattern, @offset)
unless match
raise "Unexpected character: #{@source[@offset].inspect}"
end
type, value = @lexer.block.call(match)
loc = match.begin(0)...match.end(0)
@offset = match.end(0)
@current = Token.new(type, value, loc:)
end
# @return [Boolean]
def eof?
@source.size <= @offset
end
end
# `DSL` provides a DSL to construct `PrecParser::Grammar` objects.
# It is also a context of blocks of the `PrecParser::DSL.build` method.
class DSL
# @param main [Symbol]
# @param block [Proc]
# @return [PrecParser::Grammar]
def self.build(main = :main, &block)
context = DSL.new
context.instance_eval(&block)
grammar = Grammar.new(main, context.precs)
grammar.setup
grammar
end
# @api private
def initialize
@precs = {}
end
# @api private
attr_reader :precs
# @param name [Symbol, Hash{Symbol => Symbol}, Hash{Symbol => Array<Symbol>}]
# @return [void]
def prec(name, &block)
name, succs =
if name.is_a?(Hash)
if name.size != 1
raise '`prec` should be `prec name => succs` form'
end
name, succs = name.first
[name, succs.is_a?(Array) ? succs : [succs]]
else
[name, []]
end
if @precs.include?(name)
raise "A prec '#{name}' is already defined"
end
@precs[name] = PrecDSL.build(name, succs, &block)
end
# `PrecDSL` is a context of blocks of the `PrecParser::DSL#prec` method.
class PrecDSL
# @api private
# @param name [Symbol]
# @param succs [Array<Symbol>]
# @param block [Proc]
# @return [PrecParser::]
def self.build(name, succs, &block)
context = PrecDSL.new
context.instance_eval(&block) if block
Prec.new(name, succs, context.ops)
end
# @api private
def initialize
@ops = []
end
# @api private
# @return [Array<PrecParser::Op>]
attr_reader :ops
# @param key [String]
# @param parts [Array<String, Symbol>]
# @param build [Proc]
# @return [void]
def prefix(key, *parts, &build)
@ops << Op.new(:prefix, key, parts, build)
end
# @param key [String]
# @param parts [Array<String, Symbol>]
# @param build [Proc]
# @return [void]
def postfix(key, *parts, &build)
@ops << Op.new(:postfix, key, parts, build)
end
# @param key [String]
# @param parts [Array<String, Symbol>]
# @param build [Proc]
# @return [void]
def non_assoc_prefix(key, *parts, &build)
@ops << Op.new(:non_assoc_prefix, key, parts, build)
end
# @param key [String]
# @param parts [Array<String, Symbol>]
# @param build [Proc]
# @return [void]
def non_assoc_postfix(key, *parts, &build)
@ops << Op.new(:non_assoc_postfix, key, parts, build)
end
# @param key [String]
# @param parts [Array<String, Symbol>]
# @param build [Proc]
# @return [void]
def closed(key, *parts, &build)
@ops << Op.new(:closed, key, parts, build)
end
# @param key [String]
# @param parts [Array<String, Symbol>]
# @param build [Proc]
# @return [void]
def left_assoc(key, *parts, &build)
@ops << Op.new(:left_assoc, key, parts, build)
end
# @param key [String]
# @param parts [Array<String, Symbol>]
# @param build [Proc]
# @return [void]
def right_assoc(key, *parts, &build)
@ops << Op.new(:right_assoc, key, parts, build)
end
# @param key [String]
# @param parts [Array<String, Symbol>]
# @param build [Proc]
# @return [void]
def non_assoc(key, *parts, &build)
@ops << Op.new(:non_assoc, key, parts, build)
end
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment