Skip to content

Instantly share code, notes, and snippets.

@andykingking
Last active December 27, 2015 04:09
Show Gist options
  • Save andykingking/7264908 to your computer and use it in GitHub Desktop.
Save andykingking/7264908 to your computer and use it in GitHub Desktop.
Rough implementation of the Sørensen index of two strings
def sørensen_index(string_a, string_b)
matches_a = get_bigrams string_a.dup
matches_b = get_bigrams string_b.dup
similarities = matches_a & matches_b
sum_bigrams = matches_a.count + matches_b.count
2 * similarities.count / sum_bigrams.to_f
end
def get_bigrams(str)
bigrams = []
while str.length > 1 do
bigrams << str[0..1]
str[0] = ''
end
bigrams
end
class SorensenIndex
def initialize(*strings)
@bigram_sets = strings.map {|s| BigramSet.new s}
end
def similarities
@bigram_sets.first & @bigram_sets.last
end
def total
@bigram_sets.first.count + @bigram_sets.last.count
end
def calculate
2 * similarities.count / total.to_f
end
end
class BigramSet
attr_reader :bigrams
def initialize(string)
get_bigrams string.dup
end
def &(alt_set)
@bigrams & alt_set.bigrams
end
def count
@bigrams.count
end
private
def get_bigrams(str)
@bigrams = []
while str.length > 1 do
@bigrams << str[0..1]
str[0] = ''
end
end
end
class String
include Enumerable
alias_method :each, :each_char
def bigrams
self.each_cons(2).to_a
end
end
class SorensenIndex
def initialize(*strings)
@bigram_sets = strings.map {|s| s.bigrams}
end
def calculate
2 * similarities.count / total.to_f
end
class << self
def calculate(*strings)
SorensenIndex.new(strings).calculate
end
end
private
def similarities
@bigram_sets.first & @bigram_sets.last
end
def total
@bigram_sets.first.count + @bigram_sets.last.count
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment