Skip to content

Instantly share code, notes, and snippets.

@petervandenabeele
Last active December 26, 2015 13:49
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save petervandenabeele/7161464 to your computer and use it in GitHub Desktop.
Save petervandenabeele/7161464 to your computer and use it in GitHub Desktop.

Starting with this email on @lrug mailing list:

http://lists.lrug.org/pipermail/chat-lrug.org/2013-October/009583.html

Eventually, 11 different implementations where proposed (code below). I adapted them to a Rspec spec and ran correctness and (naive) performance tests on them.

➜  spec git:(master) ✗ ls -lrt *.rb
-rw-r--r--  1 peter_v  staff   825 Oct 25 22:31 roland_spec.rb
# originally michael_spec was here, but updated to fix my mistake
-rw-r--r--  1 peter_v  staff  1515 Oct 25 22:33 peter_spec.rb
-rw-r--r--  1 peter_v  staff   825 Oct 25 22:34 tom_spec.rb
-rw-r--r--  1 peter_v  staff  1036 Oct 25 22:43 dominic_spec.rb
-rw-r--r--  1 peter_v  staff   913 Oct 26 08:20 michael_spec.rb
-rw-r--r--  1 peter_v  staff   909 Oct 26 08:41 jason_spec.rb
-rw-r--r--  1 peter_v  staff  2237 Oct 26 08:42 peter2_spec.rb
-rw-r--r--  1 peter_v  staff  1268 Oct 26 09:56 dave_spec.rb
-rw-r--r--  1 peter_v  staff  2166 Oct 26 12:50 peter3_spec.rb
-rw-r--r--  1 peter_v  staff  1010 Oct 28 09:19 dominic_recursion.rb
-rw-r--r--  1 peter_v  staff   966 Oct 28 09:19 dominic_lists.rb

All 11 now produce the expected results (variation allowed in example 0 and example 2).

➜ spec git:(master) ✗ rspec roland_spec.rb
.....
 
Finished in 0.00101 seconds
5 examples, 0 failures

➜  spec git:(master) ✗ rspec michael_spec.rb 
.....

Finished in 0.00111 seconds
5 examples, 0 failures

➜ spec git:(master) ✗ rspec peter_spec.rb
.....
 
Finished in 0.00101 seconds
5 examples, 0 failures

➜ spec git:(master) ✗ rspec tom_spec.rb
.....
 
Finished in 0.00118 seconds
5 examples, 0 failures

➜  spec git:(master) ✗ rspec dominic_spec.rb 
.....

Finished in 0.00104 seconds
5 examples, 0 failures

➜  spec git:(master) ✗ rspec jason_spec.rb 
.....

Finished in 0.00086 seconds
5 examples, 0 failures

➜  spec git:(master) ✗ rspec peter2_spec.rb 
...........

Finished in 0.00691 seconds
11 examples, 0 failures

➜  spec git:(master) ✗ rspec dave_spec.rb 
.....

Finished in 0.00105 seconds
5 examples, 0 failures

➜  spec git:(master) ✗ rspec peter3_spec.rb  
..........

Finished in 0.00185 seconds
10 examples, 0 failures

➜  spec git:(master) ✗ rspec dominic_recursion.rb 
......

Finished in 0.00112 seconds
6 examples, 0 failures

➜  spec git:(master) ✗ rspec dominic_lists.rb    
.....

Finished in 0.00097 seconds
5 examples, 0 failures

UPDATE: a proper benchmark is published here: https://gist.github.com/textgoeshere/02fd396e56ca8a8d2efe

UPDATE: since I cannot post comments on that gist, an update of fruity test for the 4 fastest contenders is shown here (I print the code below for reference):

➜  fruity git:(master) ✗ # below: multiplier 50, pattern 'lod'       
➜  fruity git:(master) ✗ ruby letters-benchmarking.rb
Running each test once. Test will take about 17 seconds.
Peter3 is faster than DominicLists by 30.000000000000004% ± 10.0%
DominicLists is faster than Dave by 30.000000000000004% ± 10.0%
Dave is faster than Jason by 10x ± 1.0

➜  fruity git:(master) ✗ # below : multiplier 25, pattern 'lold' 
➜  fruity git:(master) ✗ ruby letters-benchmarking.rb 
Running each test once. Test will take about 1 minute.
DominicLists is faster than Peter3 by 19.999999999999996% ± 1.0%
Peter3 is faster than Dave by 10.000000000000009% ± 10.0%
Dave is faster than Jason by 4x ± 0.1

As an extremely naive speed test, changing the test string to ("hello world" * 100) and running rspec again on all 10 yields (seconds, lines of code, and max RSIZE as seen in top -o cpu on MacBook Pro).

roland_spec.rb       =>   9.14 s  11 loc (1035 MB max RSIZE)
michael_spec.rb      =>   8.82 s  13 loc (1044 MB max RSIZE)
peter_spec.rb        =>   5.91 s  30 loc ( 244 MB max RSIZE)
tom_spec.rb          =>  20.05 s   9 loc ( 326 MB max RSIZE)
dominic_spec.rb      => 395.   s  14 loc ( 434 MB max RSIZE)
jason_spec.rb        =>  16.21 s  19 loc ( 535 MB max RSIZE)
peter2_spec.rb       =>   5.24 s  42 loc ( 222 MB max RSIZE)
dave_spec.rb         =>   4.82 s  19 loc ( 323 MB max RSIZE) 
peter3_spec.rb       =>   4.42 s  39 loc ( 239 MB max RSIZE)
dominic_recursion.rb =>  64.47 s  10 loc ( 482 MB max RSIZE)
dominic_lists.rb     =>   4.21 s  10 loc ( 195 MB max RSIZE)

Dominic fixed the tail handling of dominic_spec.rb into dominic_recursion.rb.

Of course, the Rspec fails on the ("hello world" * 100) test case, but the reported results make sense: the last 2 found results are always: [1092, 1093, 1099], [1092, 1096, 1099] which are indeed the last 2 'lod' combinations in "hello world" * 100.

➜ spec git:(master) ✗ cat roland_spec.rb
def letters(a, b)
  indexes = b.each_char.map do |letter|
    a.each_char.each_with_index.select {|l, _| l == letter }.map {|_, i| i }
  end

  start = indexes.shift

  if indexes.empty?
    [start]
  else
    start.product(*indexes).select {|x| x == x.sort }
  end
end

# NOTE: these are the tests run against all implementations
#       replaced by '...' in prints below.

describe "letters" do
  let(:a) { "hello world" * 100 }

  specify "example 1" do
    letters(a, "e").should == [[1]]
  end

  specify "example 2" do
    letters(a, "l").should == [[2, 3, 9]]
  end

  specify "example 3" do
    letters(a, "el").should == [[1,2], [1,3], [1,9]]
  end

  specify "example 4" do
    letters(a, "lo").should == [[2,4], [2,7], [3,4], [3,7]]
  end

  specify "example 5" do
    letters(a, "lod").should == [ [2,4,10], [2,7,10], [3,4,10], [3,7,10] ]
  end
end
➜  spec git:(master) ✗ cat michael_spec.rb 
def get_indexes(needle, haystack)
  (0...haystack.length).find_all { |i| haystack[i,1] == needle }
end

def map_indexes(needle, haystack)
  needle.split('').map do |nee|
    get_indexes nee, haystack
  end
end

def letters(haystack, needle)
  indexes = map_indexes(needle, haystack)
  first_index = indexes.shift
  first_index.product(*indexes).select{|match| match == match.sort}
end
...
➜ spec git:(master) ✗ cat peter_spec.rb
def letters(text, pattern)
  start = 0
  result = find_all_from_pattern(text, start, pattern)
end

def find_all_from_pattern(text, from, pattern)
  letter = pattern[0]
  remainder = pattern[1..-1]
  r = find_all_from_letter(text, from, letter)
  if remainder.size > 0
    r.each_with_object([]) do |previous, results|
      one_deeper = find_all_from_pattern(text, (previous+1), remainder)
      one_deeper.each do |deeper|
        results  << ([previous] + deeper)
      end
    end
  else # leaf node, end of iteration
    r
  end.map{ |e| Array(e) }
end

def find_all_from_letter(text, from, letter)
  [].tap do |result|
    while(match = find_one_from_letter(text, from, letter))
      result << match
      from = match + 1
    end
  end
end

def find_one_from_letter(text, from, letter)
  return nil unless letter
  index = text[from..-1].index(letter)
  index && (index + from)
end
...
➜ spec git:(master) ✗ cat tom_spec.rb
def letters(a, b, starting_at = 0)
  if b.empty?
    [[]]
  else
    a.each_char.with_index.drop(starting_at).
      select { |c, _| c == b[0] }.map(&:last).
      flat_map { |i| letters(a, b[1..-1], i.succ).map(&[i].method(:+)) }
  end
end
...
➜  spec git:(master) ✗ cat dominic_spec.rb
def letters(haystack, needles)
  # base case: if the letters to match are empty, we've succeeded
  return [[]] if needles.empty?

  # base case: if the word to match in is empty, we've failed
  return false if haystack.empty?

  result = []

  if needles[0] == haystack[0]
    tail = letters(haystack[1..-1], needles[1..-1])
    result += tail.map { |t| [0] + t.map { |i| i + 1 } } if tail
  end

  tail = letters(haystack[1..-1], needles)
  result += tail.map { |t| t.map { |i| i + 1 } } if tail

  result
end
...
➜  spec git:(master) ✗ cat jason_spec.rb 
def get_results(string, s_i, chars, c_i )
  finds = []
  while i = string.index(chars[c_i], s_i)
    if chars[c_i + 1]
      if more_finds = get_results(string, i + 1, chars , c_i + 1)
        finds += more_finds.map{|a| a.unshift(i)}
      else
        return nil
      end
    else
      finds << [i]
    end
    s_i = i + 1
  end
  return finds
end

def letters(a, b)
  return get_results(a, 0, b.split(//), 0 )
end
...
➜  spec git:(master) ✗ cat peter2_spec.rb 
def find_all_from_pattern(text_or_rainbow, pattern, from = 0)
  rainbow = build_rainbow(text_or_rainbow)
  letter = pattern[0]
  remainder = pattern[1..-1]
  matches = find_all_from_letter(rainbow, from, letter)
  if remainder.empty?
    matches # leaf node
  else
    find_remainder(rainbow, matches, remainder)
  end.map{ |e| Array(e) }
end

alias :letters :find_all_from_pattern

def find_remainder(rainbow, previous_matches, remainder)
  previous_matches.each_with_object([]) do |previous, results|
    find_all_from_pattern(rainbow, remainder, (previous+1)).each do |deeper|
      results  << ([previous] + deeper)
    end
  end
end

def find_all_from_letter(rainbow, from, letter)
  [].tap do |result|
    while(match = find_one_from_letter(rainbow, from, letter))
      result << match
      from = match + 1
    end
  end
end

def find_one_from_letter(rainbow, from, letter)
  index = rainbow[from].index(letter)
  index && (index + from)
end

def build_rainbow(shortened)
  return shortened if shortened.is_a?(Array)
  [].tap do |rainbow|
    until (shortened.empty?)
      rainbow.push(shortened)
      shortened = shortened[1..-1]
    end
    rainbow.push('')
  end
end
...
➜  spec git:(master) ✗ cat dave_spec.rb 
def letters(haystack_str, needle_str)
  haystack = haystack_str.chars
  needle = needle_str.chars

  @pos_cache = haystack.each.with_index.with_object(Hash.new { |h, k| h[k] = [] }) { |(c, i), h| h[c] << i if needle.include?(c) }
  @match_cache = Hash.new { |h, k| h[k] = {} }
  matches(needle)
end

def matches(chars, current_pos = -1)
  return [[]] unless chars.any?

  char, *rest = *chars
  @match_cache[char][current_pos] ||= begin
    this_matches = @pos_cache[char].select { |candidate_pos| candidate_pos > current_pos }
    this_matches.each_with_object([]) do |this_pos, memo|
      matches(rest, this_pos).each do |rest_matches|
        memo << [this_pos, *rest_matches]
      end
    end
  end
end
...
➜  spec git:(master) ✗ cat peter3_spec.rb
def find_all_from_pattern(text, pattern, from = 0)
  @reverse_index ||= reverse_index(text)
  @cache ||= {}
  letter = pattern[0]
  remainder = pattern[1..-1]
  matches = @reverse_index[letter].select{ |i| i >= from }
  if remainder.empty?
    matches # leaf node
  else
    find_remainder(matches, remainder)
  end.map{ |e| Array(e) }
end

alias :letters :find_all_from_pattern

def find_remainder(previous_matches, remainder)
  previous_matches.each_with_object([]) do |previous, results|
    find_all_remaining_after_previous(remainder, previous, results)
  end
end

def find_all_remaining_after_previous(remainder, previous, results)
  with_cache(remainder, previous) do |_remainder, _previous|
    find_all_from_pattern(nil, _remainder, (_previous+1))
  end.each do |deeper|
    results  << ([previous] + deeper)
  end
end

def reverse_index(haystack)
  Hash.new{ |h,k| h[k] = [] }.tap do |ri|
    haystack.each_char.with_index do |c, i|
      ri[c] << i
    end
  end
end

MAX_REMAINDER_FOR_CACHE = 1

def with_cache(remainder, previous)
  if remainder.size <= MAX_REMAINDER_FOR_CACHE
    @cache[remainder.hash ^ previous.hash] ||= yield(remainder, previous)
  else
    yield(remainder, previous)
  end
end
...
➜  spec git:(master) ✗ cat dominic_recursion.rb 
def letters(haystack, needles, index = 0)
  # base case: if the letters to match are empty, we've succeeded
  return [[]] if needles.empty?

  # base case: if the word to match in is empty, we've failed
  return [] if haystack.empty?

  result = []

  if needles[0] == haystack[0]
    tail = letters(haystack[1..-1], needles[1..-1], index + 1)
    result += tail.map { |t| [index] + t }
  end

  result + letters(haystack[1..-1], needles, index + 1)
end
...
➜  spec git:(master) ✗ cat dominic_lists.rb
 def letters(haystack, needles)
   indices = Hash[haystack.each_char.with_index.group_by { |(char, _)| char }.map { |(char, values)| [char, values.map { |(_, index)| index }] }]

   results = indices[needles[0]].map { |i| [i] }
   needles[1..-1].each_char do |char|
     results = results.flat_map do |r|
       indices[char].drop_while { |i| i < r.last }.map { |i| r + [i] }
     end
   end

   results
end
...

For reference, my version of the code for fruity. Adapted from the original by Dave on https://gist.github.com/textgoeshere/02fd396e56ca8a8d2efe

  fruity git:(master)  cat letters-benchmarking.rb 
require 'fruity'

module Roland
  def self.letters(a, b)
    indexes = b.each_char.map do |letter|
      a.each_char.each_with_index.select {|l, _| l == letter }.map {|_, i| i }
    end

    start = indexes.shift

    if indexes.empty?
      [start]
    else
      start.product(*indexes).select {|x| x == x.sort }
    end
  end
end

module Tom
  def self.letters(a, b, starting_at = 0)
    if b.empty?
      [[]]
    else
      a.each_char.with_index.drop(starting_at).
        select { |c, _| c == b[0] }.map(&:last).
        flat_map { |i| letters(a, b[1..-1], i.succ).map(&[i].method(:+)) }
    end
  end
end

module DominicLists
 def self.letters(haystack, needles)
   indices = Hash[haystack.each_char.with_index.group_by { |(char, _)| char }.map { |(char, values)| [char, values.map { |(_, index)| index }] }]

   results = indices[needles[0]].map { |i| [i] }
   needles[1..-1].each_char do |char|
     results = results.flat_map do |r|
       indices[char].drop_while { |i| i < r.last }.map { |i| r + [i] }
     end
   end

   results
  end
end

module Dave
  extend self

  def letters(haystack_str, needle_str)
    haystack = haystack_str.chars
    needle   = needle_str.chars

    @pos_cache   = haystack.each.with_index.with_object(Hash.new { |h, k| h[k] = [] }) { |(c, i), h| h[c] << i if needle.include?(c) }
    @match_cache = Hash.new { |h, k| h[k] = {} }
    matches(needle)
  end

  def matches(chars, current_pos = -1)
    return [[]] unless chars.any?

    char, *rest = *chars
    @match_cache[char][current_pos] ||= begin
        this_matches = @pos_cache[char].select { |candidate_pos| candidate_pos > current_pos }
        this_matches.each_with_object([]) do |this_pos, memo|
        matches(rest, this_pos).each do |rest_matches|
          memo << [this_pos, *rest_matches]
        end
      end
    end
  end
end

module Peter
  extend self

  def letters(text, pattern)
    start = 0
    result = find_all_from_pattern(text, start, pattern)
  end

  def find_all_from_pattern(text, from, pattern)
    letter = pattern[0]
    remainder = pattern[1..-1]
    r = find_all_from_letter(text, from, letter)
    if remainder.size > 0
      r.each_with_object([]) do |previous, results|
        one_deeper = find_all_from_pattern(text, (previous+1), remainder)
        one_deeper.each do |deeper|
          results  << ([previous] + deeper)
        end
      end
    else # leaf node, end of iteration
      r
    end.map{ |e| Array(e) }
  end

  def find_all_from_letter(text, from, letter)
    [].tap do |result|
      while(match = find_one_from_letter(text, from, letter))
        result << match
        from = match + 1
      end
    end
  end

  def find_one_from_letter(text, from, letter)
    return nil unless letter
    index = text[from..-1].index(letter)
    index && (index + from)
  end
end

module Peter3
  extend self

  def find_all_from_pattern(text, pattern, from = 0)
    @reverse_index ||= reverse_index(text)
    @cache ||= {}
    letter = pattern[0]
    remainder = pattern[1..-1]
    matches = @reverse_index[letter].select{ |i| i >= from }
    if remainder.empty?
      matches # leaf node
    else
      find_remainder(matches, remainder)
    end.map{ |e| Array(e) }
  end

  alias :letters :find_all_from_pattern

  def find_remainder(previous_matches, remainder)
    previous_matches.each_with_object([]) do |previous, results|
      find_all_remaining_after_previous(remainder, previous, results)
    end
  end

  def find_all_remaining_after_previous(remainder, previous, results)
    with_cache(remainder, previous) do |_remainder, _previous|
      find_all_from_pattern(nil, _remainder, (_previous+1))
    end.each do |deeper|
      results  << ([previous] + deeper)
    end
  end

  def reverse_index(haystack)
    Hash.new{ |h,k| h[k] = [] }.tap do |ri|
      haystack.each_char.with_index do |c, i|
        ri[c] << i
      end
    end
  end

  MAX_REMAINDER_FOR_CACHE = 2

  def with_cache(remainder, previous)
    if remainder.size <= MAX_REMAINDER_FOR_CACHE
      @cache[remainder.hash ^ previous.hash] ||= yield(remainder, previous)
    else
      yield(remainder, previous)
    end
  end

end

module Jason
  extend self

  def get_results(string, s_i, chars, c_i )
    finds = []
    while i = string.index(chars[c_i], s_i)
      if chars[c_i + 1]
        if more_finds = get_results(string, i + 1, chars , c_i + 1)
          finds += more_finds.map{|a| a.unshift(i)}
        else
          return nil
        end
      else
        finds << [i]
      end
      s_i = i + 1
    end
    return finds
  end

  def letters(a, b)
    return get_results(a, 0, b.split(//), 0 )
  end
end

haystack = "hello world" * 25
needle   = "lold"

contendors = [Dave, Jason, Peter3, DominicLists]
compare(contendors.each_with_object({}) { |c, h| h[c] = -> { c.letters(haystack, needle) } })
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment