Skip to content

Instantly share code, notes, and snippets.

@jozip
Created February 19, 2011 23:18
Show Gist options
  • Save jozip/835489 to your computer and use it in GitHub Desktop.
Save jozip/835489 to your computer and use it in GitHub Desktop.
A simple declarative lexer
# lexer.rb -- A simple declarative lexer
#
# Cobbled together by Johan Persson <johan.z.persson@gmail.com>
#
# Based on Martin Fowler's lexer wrapper in the article
# http://martinfowler.com/bliki/HelloRacc.html
#
# The following example defines a tiny lexer for Erlang lists
# and atoms:
#
# class MyLexer < Lexer
# ignores /\s+/
# tokens /[a-z][a-zA-Z0-9_.@]*/ => :ATOM,
# /\[/ => :LBRACKET,
# /\]/ => :RBRACKET,
# /,/ => :COMMA
# end
#
# tokens = MyLexer.new("[a,b,c]")
# tokens.next_token # => [:LBRACKET, "["]
#
# Best served with Racc.
require 'strscan'
class Lexer
class << self
@@rules = []
def ignores(*patterns)
patterns.each { |pat| @@rules << [pat, nil] }
end
def tokens(pairs = {})
pairs.each { |pat, tok| @@rules << [pat, tok] }
end
def keywords(*strs)
strs.each { |str| @@rules << [/#{str}/, str] }
end
end
def initialize(base = "")
@base = StringScanner.new(base)
end
def next_token
return [false, false] if @base.empty?
t = get_token()
t[0].nil? ? next_token() : t
end
protected
def get_token
@@rules.each do |pattern, token|
m = @base.scan(pattern)
return [token, m] if m
end
raise "unexpected characters <#{@base.peek(5)}>"
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment