Skip to content

Instantly share code, notes, and snippets.

@jordansissel
Created December 17, 2011 20:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jordansissel/1491302 to your computer and use it in GitHub Desktop.
Save jordansissel/1491302 to your computer and use it in GitHub Desktop.
Comparing regexp pattern speeds in MRI Ruby and JRuby
require "cabin"
require "logger"
logger = Cabin::Channel.new
logger.subscribe(Logger.new(STDOUT))
logger.level = :warn
patterns = [
/('(?:[^\\']+|(?:\\.)+)*')/,
/('(?:[^\\']+|(?:\\.))*')/,
/('(?:\\.|[^\\']+)*')/
]
# Do it twice to avoid 'warm up' in jruby, etc.
patterns += patterns
string = "Hello, 'World \'Friend, how are you feeling' ?"
results = {}
iterations = 10000000
patterns.each do |re|
timer = logger.time(re.inspect)
iterations.times do
if !re.match(string)
logger.warn("Failed to match?")
end
end
duration = timer.stop
rate = iterations / duration
puts "Rate: #{rate} #{re.to_s}"
end

What?

I was working on grok and wanted to test various execution speeds of the same pattern written different ways. I picked the quoted string pattern from grok and went to town.

While I found no discernible differences in the different patterns and speeds, I did find that JRuby's Joniguruma beats the pants off of Ruby 1.9.2's Oniguruma regexp engine.

With some rounding, the average in ruby 1.9.2 was 530000 matches/sec, while Jruby 1.6.5 in 1.9 mode averaged 690000 matches/sec.

That's 30% faster in JRuby with no code changes.

Ruby 1.9.2

First pass:

Rate: 531456.3673108591           (?-mix:('(?:[^\\']+|(?:\\.)+)*'))
Rate: 532358.8482186638           (?-mix:('(?:[^\\']+|(?:\\.))*'))
Rate: 532718.207312862            (?-mix:('(?:\\.|[^\\']+)*'))

Second pass:

Rate: 537258.5459318311           (?-mix:('(?:[^\\']+|(?:\\.)+)*'))
Rate: 531908.6117197308           (?-mix:('(?:[^\\']+|(?:\\.))*'))
Rate: 537653.6067093286           (?-mix:('(?:\\.|[^\\']+)*'))

JRuby 1.6.5 / OpenJDK 1.6.0_22 64bit

First Pass:

Rate: 658587.987355111           (?-mix:('(?:[^\\']+|(?:\\.)+)*'))
Rate: 685682.940208448           (?-mix:('(?:[^\\']+|(?:\\.))*'))
Rate: 704374.163555681           (?-mix:('(?:\\.|[^\\']+)*'))

Second Pass:

Rate: 690941.753610171           (?-mix:('(?:[^\\']+|(?:\\.)+)*'))
Rate: 689750.31038764            (?-mix:('(?:[^\\']+|(?:\\.))*'))
Rate: 681523.887412254           (?-mix:('(?:\\.|[^\\']+)*'))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment