Skip to content

Instantly share code, notes, and snippets.

@ghoseb ghoseb/freq.clj
Last active Dec 12, 2015

Embed
What would you like to do?
Sample code from Pune Clojure Dojo Meetup session held at 8 feb 2013.
(ns ^{:doc "Word frequencies in a text file."
:author "Baishampayan Ghose <b.ghose@helpshift.com>"}
meetup.freq
(:require [clojure.java.io :as io]
[clojure.string :as s]))
(def stop-word? #{"is" "the" "am" "i" "that" "if"}) ;; fill it up!
;;; all these functions are written in a "point free" style
(def get-lines (comp line-seq io/reader))
(def get-words (partial mapcat (partial re-seq #"\w+")))
(def lowercase-words (partial map s/lower-case))
(def remove-stop-words (partial remove stop-word?))
(defn count-freqs
[coll]
(reduce (fn [res word]
(update-in res [word] (fnil inc 0)))
{} coll))
(def sort-map (partial sort-by (comp - val)))
(defn word-freqs
([file]
(word-freqs file 10))
([file n]
(->> file
get-lines
get-words
lowercase-words
remove-stop-words
count-freqs
sort-map
(take n))))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.