Skip to content

Instantly share code, notes, and snippets.

@dbasch
Created November 26, 2012 23:08
Show Gist options
  • Save dbasch/4151277 to your computer and use it in GitHub Desktop.
Save dbasch/4151277 to your computer and use it in GitHub Desktop.
Indexing text files using transient state in Clojure
;; Create in-memory mappins from each word to a set of files that contain it.
;; If you kill the server, you'll have to reindex everything.
(defn build-index[dirname]
(loop [files (map #(.getName %) (.listFiles (java.io.File. dirname))) idx (transient {})]
(if (empty? files) (persistent! idx)
(recur
(rest files)
(loop [tokens (-> (str dirname "/" (first files)) slurp tokenize)
iidx idx]
(if (empty? tokens) iidx
(recur
(rest tokens)
(assoc! iidx (first tokens) (union (iidx (first tokens)) #{(first files)})))))))))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment