The purpose of this program is to solve the following problem posed by Jon Bentley:
Given a text file and an integer k, print the k most common words in the file (and the number of their occurrences) in decreasing frequency.
(let [words (re-seq #"\w+" (slurp input))
word-freq-map (frequencies (map clojure.string/lower-case words))] ; {w₁ f₁, w₂ f₂,...}
(take k ; ([w₁ f₁], [w₂ f₂],...), f₁ > f₂, w₁ > w₂
(sort-by identity (fn [[w₁ f₁] [w₂ f₂]]
(let [frqs-comp (- (compare f₁ f₂))]
(if-not (zero? frqs-comp) frqs-comp
(compare w₁ w₂))))
(into [] word-freq-map))))
With num-or
macro, for simpler comparsion of words and frequencies:
(defmacro numor
([] 0)
([x] x)
([x & next] `(let [numor# ~x] (if-not (zero? numor#) numor# (numor ~@next)))))
(defn nonzero [coll] ; just like numor but fn
(first (filter (complement zero?) coll)))
(let [words (re-seq #"\w+" (slurp input))]
(take k ; ([w₁ f₁], [w₂ f₂],...), f₁ > f₂, w₁ > w₂
(sort-by identity (fn [[w₁ f₁] [w₂ f₂]] (numor (- (compare f₁ f₂)) (compare w₁ w₂)))
(into [] (frequencies (map clojure.string/lower-case words))))))
For input:
Functional programming is like describing your problem to a
mathematician. Imperative programming is like giving instructions to
an idiot.
-- arcus, #scheme on Freenode
Output:
(["is" 2] ["like" 2] ["programming" 2] ["to" 2]
["a" 1] ["an" 1] ["arcus" 1] ["describing" 1] ["freenode" 1] ["functional" 1]
["giving" 1] ["idiot" 1] ["imperative" 1] ["instructions" 1] ["mathematician" 1]
["on" 1] ["problem" 1] ["scheme" 1] ["your" 1])
(let [words (re-seq #"\w+" (slurp input))
word-freq-map (frequencies (map clojure.string/lower-case words)) ; {w₁ f₁, w₂ f₂,...}
freq-words-map (reduce-kv (fn [r k v] (assoc r v (clojure.set/union (get r v) #{k}))) {} word-freq-map)] ; {f₁ #{w₁₁,...}, f₂ #{w₂₁,...}}
(into (sorted-map-by >) freq-words-map)) ; {f₁ #{w₁₁,...}, f₂ #{w₂₁,...}}, f₁ > f₂
For this input:
Functional programming is like describing your problem to a
mathematician. Imperative programming is like giving instructions to
an idiot.
-- arcus, #scheme on Freenode
This output:
{2 #{"is" "like" "programming" "to"},
1 #{"a" "functional" "instructions" "scheme" "describing"
"imperative" "an" "freenode" "giving" "arcus" "your" "idiot"
"mathematician" "on" "problem"}}