Skip to content

Instantly share code, notes, and snippets.

@practicalli-johnny
Created November 1, 2016 15:31
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save practicalli-johnny/4601d4523b4bc3b31cd0cc2afb05a705 to your computer and use it in GitHub Desktop.
Save practicalli-johnny/4601d4523b4bc3b31cd0cc2afb05a705 to your computer and use it in GitHub Desktop.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Most Common Words - Threading Macro Style
;; Book: The importance of being Earnest, Oscar Wilde
;; Source: Project Guttenburg (UTF-8 format)
(def importance-of-being-earnest
(slurp "http://www.gutenberg.org/cache/epub/844/pg844.txt"))
(def common-english-words
(-> (slurp "http://www.textfixer.com/resources/common-english-words.txt")
(clojure.string/split #",")
set))
(defn most-common-words [book]
(->> book
(re-seq #"[a-zA-Z0-9|']+")
(map #(clojure.string/lower-case %))
(remove common-english-words)
frequencies
(sort-by val)
reverse))
(most-common-words importance-of-being-earnest)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Deconstructing the code in the repl
;; To understand what each of the functions do in the most-common-words function you can comment out one or more expressions by placing the comment reader macro #_ in front of the expression
(defn most-common-words [book]
(->> book
#_(re-seq #"[a-zA-Z0-9|']+")
#_(map #(clojure.string/lower-case %))
#_(remove common-english-words)
#_frequencies
#_(sort-by val)
#_reverse))
;; Now the most-common-words function will only return the result of evaluating book (the full text of the book). To see what each of the other lines do, simply remove the #_ character from the front of an expression and re-evaluate the most-common-words function in the repl
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment