Skip to content

Instantly share code, notes, and snippets.

@tabidots
Created June 23, 2022 08:55
Show Gist options
  • Save tabidots/61ce5660f1ba837ed1f4dd5837be17cb to your computer and use it in GitHub Desktop.
Save tabidots/61ce5660f1ba837ed1f4dd5837be17cb to your computer and use it in GitHub Desktop.
Jaro similarity in Clojure
(defn ordered-matches
[s1 s2]
(let [enumerate (fn [coll] (map-indexed vector coll))
floor (fn [x] (Math/floor x)) ; cljs: (.floor js/Math x)
window-span (-> (max (count s1) (count s2))
(/ 2) floor dec)]
(remove nil?
(for [[i top] (enumerate s1)] ; cannot be put into a single for-comprehension
(first
(for [[j bottom] (enumerate s2)
:when (and (= top bottom)
(<= (- i window-span) j (inc (+ i window-span))))]
top))))))
(defn jaro-similarity
"Returns the Jaro similarity of two strings (1 = exact match)."
[s1 s2]
(let [m1 (ordered-matches s1 s2)
m2 (ordered-matches s2 s1)]
(if (empty? m1) 0
(let [m (count m1)
non-matches (->> (map = m1 m2) (filter false?) count)
t (/ non-matches 2)]
(/ (+ (/ m (count s1)) (/ m (count s2)) (/ (- m t) m)) 3)))))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment