Skip to content

Instantly share code, notes, and snippets.

@shayanjm
Created July 15, 2014 20:46
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save shayanjm/5c5f490289bd7083a088 to your computer and use it in GitHub Desktop.
Save shayanjm/5c5f490289bd7083a088 to your computer and use it in GitHub Desktop.
A rough implementation of Normalized Google Distance in Clojure
; Rough implementation of Normalized Google Distance algorithm
; Assumed total number of indexed pages = 42,000,000,000
(defn get-ngd
"Returns the normalized google distance of two searchable terms. Returns nil if no results available for either query, or if there is no overlap for either query. The closer the result trends to 0, the more closely 'related' the terms are."
[term1 term2]
(let [m 42000000000
fx (Integer. (:totalResults (:searchInformation (google-search term1))))
fy (Integer. (:totalResults (:searchInformation (google-search term2))))
fxy (Integer. (:totalResults (:searchInformation (google-search (str term1 "+" term2)))))
ngdnumerator (- (max (math/log10 fx) (math/log10 fy)) (math/log10 fxy))
ngddenominator (- (math/log10 m) (min (math/log10 fx) (math/log10 fy)))
ngd (/ ngdnumerator ngddenominator)]
(if (or (utils/NaN? ngd) (Double/isInfinite ngd))
nil ngd)))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment