Skip to content

Instantly share code, notes, and snippets.

@jpmckinney
jpmckinney / fingerprint.rb
Created November 17, 2011 21:46
Google Refine fingerprint clustering algorithm in Ruby
# blog post: http://blog.slashpoundbang.com/post/12938588984/google-refine-fingerprint-clustering-algorithm-in-ruby
# coding: utf-8
require 'unicode_utils/downcase'
class String
# Normalize spaces and fingerprint.
# http://code.google.com/p/google-refine/wiki/ClusteringInDepth
# http://code.google.com/p/google-refine/source/browse/trunk/main/src/com/google/refine/clustering/binning/FingerprintKeyer.java
def fingerprint