Skip to content

Instantly share code, notes, and snippets.

@bostonaholic
Last active August 29, 2015 14:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bostonaholic/cef5a4916d46782ed368 to your computer and use it in GitHub Desktop.
Save bostonaholic/cef5a4916d46782ed368 to your computer and use it in GitHub Desktop.
Top 10 Clojure Features Requested in 2014

file taken from http://cdn.cognitect.com/stateofclojure/2014/clj-feature.txt

$ cat /tmp/clj-feature.txt | grep -v '^$' | sort | uniq -c | sort -nr | head
  10 "feature expressions"
   7 "Types"
   5 "pattern matching"
   5 "fast startup"
   5 "Feature expressions"
   4 "debugger"
   4 "Static typing"
   4 "Faster startup"
   3 "type checking"
   3 "none"
@bostonaholic
Copy link
Author

I really should split the lines into bigrams/trigrams and count freq from there. This really only matches if the entire line is exact.

@bostonaholic
Copy link
Author

UPDATED: Even with the more correct solution, the results were still pretty much the same.

(require '[clojure.string :as str]
         '[clojure.java.io :as io])

(defn ngram [n coll]
  (partition n 1 coll))

(def word-freqs (comp frequencies (partial mapcat (partial ngram 1))))
(def bigram-freqs (comp frequencies (partial mapcat (partial ngram 2))))
(def trigram-freqs (comp frequencies (partial mapcat (partial ngram 3))))

(defn words [line]
  (filter (comp not empty?) (str/split line #"\W")))

(defn sorted-word-freqs [lines]
  (take 10 (reverse (sort-by second (word-freqs (map words lines))))))

(defn sorted-bigram-freqs [lines]
  (take 10 (reverse (sort-by second (bigram-freqs (map words lines))))))

(defn sorted-trigram-freqs [lines]
  (take 10 (reverse (sort-by second (trigram-freqs (map words lines))))))

(defn normalize [line]
  (-> line
      str/lower-case
      (str/replace #"\"" "") ; remove quotes
      ))

(defn multiple-lines [coll]
  (drop-last (interleave (repeat "\n") coll)))

(with-open [rdr (io/reader "/tmp/clj-feature.txt")]
  (let [lines (line-seq rdr)
        normalized (pmap normalize lines)
        wfs (sorted-word-freqs normalized)
        bfs (sorted-bigram-freqs normalized)
        tfs (sorted-trigram-freqs normalized)]
    (println "\nBigrams:\n" (multiple-lines bfs) "\n")
    (println "\nTrigrams:\n " (multiple-lines tfs) "\n")))
Bigrams:
 ([(feature expressions) 22] 
 [(pattern matching) 16] 
 [(core typed) 16] 
 [(static typing) 14] 
 [(error messages) 14] 
 [(startup time) 12] 
 [(support for) 12] 
 [(faster startup) 11] 
 [(better error) 11] 
 [(fast startup) 10])

Trigrams:
 ([(better error messages) 7] 
 [(i don t) 6] 
 [(tail call optimization) 6] 
 [(optional static typing) 5] 
 [(fast startup time) 4] 
 [(faster startup time) 4] 
 [(clojure in clojure) 4] 
 [(common lisp style) 4] 
 [(for core typed) 3] 
 [(better stack traces) 3])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment