Skip to content

Instantly share code, notes, and snippets.

@jrunning
Created March 24, 2016 19:07
Show Gist options
  • Save jrunning/29ac867d0fd2f9c1b7ea to your computer and use it in GitHub Desktop.
Save jrunning/29ac867d0fd2f9c1b7ea to your computer and use it in GitHub Desktop.
Regexp.union vs. RegexpTrie.union
require 'benchmark/ips'
require 'regexp_trie'
NUM_WORDS = 1000 # Approximate number of words to be chosen from the dictionary
DICT_PATH = './google-10000-english.txt' # https://github.com/first20hours/google-10000-english
TEXT_PATH = './big.txt' # http://norvig.com/big.txt (~6MB)
num_words_in_dict = `wc -l "#{DICT_PATH}"`.to_i
choose_probability = NUM_WORDS.to_f / num_words_in_dict
# Choose ~num_words words from the dictionary
WORDS = File.foreach(DICT_PATH).with_object([]) do |line, arr|
arr << line.chomp if rand < choose_probability
end
TEXT = File.read(TEXT_PATH)
REPORTS = {
"Regexp" => Regexp.union(WORDS),
"RegexpTrie" => RegexpTrie.union(WORDS),
"Regexp (case ins.)" => Regexp.union(WORDS.map {|word| Regexp.new(word, true) }),
"RegexpTrie (case ins.)" => RegexpTrie.union(WORDS, option: Regexp::IGNORECASE)
}
def search(expr)
TEXT.scan(expr)
end
Benchmark.ips do |x|
REPORTS.each do |title, regexp|
x.report(title) { search(/\b#{regexp}\b/) }
end
x.compare!
end
Calculating -------------------------------------
Regexp 1.000 i/100ms
RegexpTrie 1.000 i/100ms
Regexp (case ins.) 1.000 i/100ms
RegexpTrie (case ins.) 1.000 i/100ms
-------------------------------------------------
Regexp 0.096 (± 0.0%) i/s - 1.000 in 10.417496s
RegexpTrie 1.710 (± 0.0%) i/s - 9.000 in 5.263086s
Regexp (case ins.) 0.070 (± 0.0%) i/s - 1.000 in 14.193575s
RegexpTrie (case ins.) 1.238 (± 0.0%) i/s - 7.000 in 5.653907s
Comparison:
RegexpTrie: 1.7 i/s
RegexpTrie (case ins.): 1.2 i/s - 1.38x slower
Regexp: 0.1 i/s - 17.82x slower
Regexp (case ins.): 0.1 i/s - 24.27x slower
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment