Skip to content

Instantly share code, notes, and snippets.

@thejefflarson
Created January 25, 2013 17:19
Show Gist options
  • Save thejefflarson/4636237 to your computer and use it in GitHub Desktop.
Save thejefflarson/4636237 to your computer and use it in GitHub Desktop.
def ngrams(n)
tokenize.each_with_index.map {|_, i| toks.slice(i, n).join ' ' }
end
def sim_hash!
gramz = Set.new(ngrams(9))
ordered = gramz.map {|g| g.hash }.sort
return if ordered.length < 10
self.sim_hash = [0..10].reduce(0) { |fingerprint, _| fingerprint ^= ordered.shift }
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment