Skip to content

Instantly share code, notes, and snippets.

@spundun
Last active August 29, 2015 14:07
Show Gist options
  • Save spundun/8d43cad2b04f7a96246e to your computer and use it in GitHub Desktop.
Save spundun/8d43cad2b04f7a96246e to your computer and use it in GitHub Desktop.
Treetop Nested Non-Unique Delimiters

Trying to show a problem with the Treetop Documentation at http://cjheath.github.io/treetop/pitfalls_and_advanced_techniques.html

Matching Nested Structures With Non-Unique Delimeters

Say I want to parse a diabolical wiki syntax in which the following interpretations apply.

** *hello* ** --> <strong><em>hello</em></strong>
* **hello** * --> <em><strong>hello</strong></em>

rule strong
  '**' (em / !'*' . / '\*')+ '**'
end

rule em
  '**' (strong / !'*' . / '\*')+ '**'    
end

Emphasized text is allowed within strong text by virtue of em being the first alternative. Since em will only successfully parse if a matching * is found, it is permitted, but other than that, no * characters are allowed unless they are escaped.

It gives me the following error.

$ rake
Expected one of **, \* at line 1, column 4 (byte 4) after ** 
1
4
rake aborted!
Exception: Parse error at offset: 0
/Users/spundun/Documents/code_samples/treetop_nested_non_unique_delimiters/parser.rb:24:in `parse'
/Users/spundun/Documents/code_samples/treetop_nested_non_unique_delimiters/Rakefile:6:in `block in <top (required)>'
/Users/spundun/.rvm/gems/ruby-2.1.2/bin/ruby_executable_hooks:15:in `eval'
/Users/spundun/.rvm/gems/ruby-2.1.2/bin/ruby_executable_hooks:15:in `<main>'
Tasks: TOP => default => test
(See full trace by running task with --trace)
$ 
gem 'rake'
gem 'pry'
gem 'treetop'
GEM
specs:
coderay (1.1.0)
method_source (0.8.2)
polyglot (0.3.5)
pry (0.10.1)
coderay (~> 1.1.0)
method_source (~> 0.8.1)
slop (~> 3.4)
rake (10.3.2)
slop (3.6.0)
treetop (1.5.3)
polyglot (~> 0.3)
PLATFORMS
ruby
DEPENDENCIES
pry
rake
treetop
grammar NestedNonunique
rule body
strong / em
end
rule strong
'**' (em / !'*' . / '\*')+ '**'
end
rule em
'**' (strong / !'*' . / '\*')+ '**'
end
end
require 'treetop'
class Parser
# Load the Treetop grammar from the 'sexp_parser' file, and
# create a new instance of that parser as a class variable
# so we don't have to re-create it every time we need to
# parse a string
base_path = File.expand_path(File.dirname(__FILE__))
Treetop.load(File.join(base_path, 'nested_nonunique.treetop'))
@@parser = NestedNonuniqueParser.new
def self.parse(data)
tree = @@parser.parse(data)
# If the AST is nil then there was an error during parsing
# we need to report a simple error message to help the user
if(tree.nil?)
puts @@parser.failure_reason
puts @@parser.failure_line
puts @@parser.failure_column
raise Exception, "Parse error at offset: #{@@parser.index}"
end
return tree
end
end
require './parser.rb'
task default: [:test]
task :test do |variable|
p Parser.parse("** *hello* **")
p Parser.parse("* **hello** *")
end
@cjheath
Copy link

cjheath commented Nov 3, 2014

Thanks, applied. Sorry it took a while to get to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment