Skip to content

Instantly share code, notes, and snippets.

@antoniogarrote
Created April 25, 2011 20:20
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save antoniogarrote/941130 to your computer and use it in GitHub Desktop.
Save antoniogarrote/941130 to your computer and use it in GitHub Desktop.
(ns backend.levenshtein-command
(:use [somnium.congomongo :only [get-coll]])
(:use [somnium.congomongo.coerce :only [coerce]])
(:import [com.mongodb BasicDBObjectBuilder BasicDBObject]))
(defn- build-bson-object
([collection-object source field threshold level separators limit output-field]
(let [level (if (nil? level) "word" level)
builder (BasicDBObjectBuilder/start)]
(doto builder
(.add "levenshtein" (.getName collection-object))
(.add "sourceTerm" source)
(.add "field" field)
(.add "threshold" (float threshold))
(.add "word" (= "word" (name level)))
(.add "sentence" (= "sentence" (name level)))
(.add "separators" (str (or separators " .;:"))))
(when (not (nil? limit))
(.add builder "limit" limit))
(when (not (nil? output-field))
(.add builder "outputField" output-field))
(.get builder))))
(defn- execute-query
([^com.mongodb.DBCollection collection levenshtein-command]
(let [db (.getDB collection)]
(coerce (.command db levenshtein-command) [:mongo :clojure]))))
(defn levenshtein
"Selects documents in a collection matching the minimum threshold provided as an option (0,1).
Options:
- collection: the collection where the query will be executed
- source: the text to match
- threshold: minimum similarity for a match, [0,1]
- opts:
*level: 'word' will use chars in the string as tokens, 'sentence' will split the string
in words and use them as tokens
*separators: when using the 'sentence' level, the list of chars used to split the original string
*limit: max number of matces to return
*output-field Field to be returned from the matching object instead of the whole object"
([collection source field threshold & opts]
(let [opts (or (first opts) {})
collection-object (get-coll collection)
bson-object (build-bson-object collection-object
source
field
threshold
(:level opts)
(:separators opts)
(:limit opts)
(:output-field opts))]
(execute-query collection-object bson-object))))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment