Skip to content

Instantly share code, notes, and snippets.

@zahardzhan
Last active July 31, 2016 05:26
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save zahardzhan/78399cb6c50006a1da82 to your computer and use it in GitHub Desktop.
Save zahardzhan/78399cb6c50006a1da82 to your computer and use it in GitHub Desktop.
Program Pearls, Continued: Common Words, Clojure

Program Pearls, Continued: Common Words (1986)

from Literate Programming, D. Knuth, 1992, Chapter 6.

The purpose of this program is to solve the following problem posed by Jon Bentley:

Given a text file and an integer k, print the k most common words in the file (and the number of their occurrences) in decreasing frequency.

Solution

(let [words (re-seq #"\w+" (slurp input))
      word-freq-map (frequencies (map clojure.string/lower-case words))] ; {w₁ f₁, w₂ f₂,...}
  (take k ; ([w₁ f₁], [w₂ f₂],...), f₁ > f₂, w₁ > w₂
    (sort-by identity (fn [[w₁ f₁] [w₂ f₂]]
                        (let [frqs-comp (- (compare f₁ f₂))] 
                          (if-not (zero? frqs-comp) frqs-comp
                            (compare w₁ w₂))))
      (into [] word-freq-map))))

With num-or macro, for simpler comparsion of words and frequencies:

(defmacro numor
  ([] 0)
  ([x] x)
  ([x & next] `(let [numor# ~x] (if-not (zero? numor#) numor# (numor ~@next)))))
  
(defn nonzero [coll] ; just like numor but fn
  (first (filter (complement zero?) coll)))

(let [words (re-seq #"\w+" (slurp input))]
  (take k ; ([w₁ f₁], [w₂ f₂],...), f₁ > f₂, w₁ > w₂
    (sort-by identity (fn [[w₁ f₁] [w₂ f₂]] (numor (- (compare f₁ f₂)) (compare w₁ w₂)))
      (into [] (frequencies (map clojure.string/lower-case words))))))

For input:

Functional programming is like describing your problem to a
mathematician.  Imperative programming is like giving instructions to
an idiot.
-- arcus, #scheme on Freenode

Output:

(["is" 2] ["like" 2] ["programming" 2] ["to" 2] 
 ["a" 1] ["an" 1] ["arcus" 1] ["describing" 1] ["freenode" 1] ["functional" 1] 
 ["giving" 1] ["idiot" 1] ["imperative" 1] ["instructions" 1] ["mathematician" 1] 
 ["on" 1] ["problem" 1] ["scheme" 1] ["your" 1])

Alternative solution

(let [words (re-seq #"\w+" (slurp input))
      word-freq-map (frequencies (map clojure.string/lower-case words)) ; {w₁ f₁, w₂ f₂,...}
      freq-words-map (reduce-kv (fn [r k v] (assoc r v (clojure.set/union (get r v) #{k}))) {} word-freq-map)] ; {f₁ #{w₁₁,...}, f₂ #{w₂₁,...}}
  (into (sorted-map-by >) freq-words-map)) ; {f₁ #{w₁₁,...}, f₂ #{w₂₁,...}}, f₁ > f₂

For this input:

Functional programming is like describing your problem to a
mathematician.  Imperative programming is like giving instructions to
an idiot.
-- arcus, #scheme on Freenode

This output:

{2 #{"is" "like" "programming" "to"}, 
 1 #{"a" "functional" "instructions" "scheme" "describing" 
     "imperative" "an" "freenode" "giving" "arcus" "your" "idiot" 
     "mathematician" "on" "problem"}}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment