Skip to content

Instantly share code, notes, and snippets.

@bobpoekert
Last active December 18, 2015 08:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bobpoekert/5754138 to your computer and use it in GitHub Desktop.
Save bobpoekert/5754138 to your computer and use it in GitHub Desktop.
(defn parse-row
[^String row]
(let [[k v] (.split row \tab)
[k1 k2] (map (partial string/join " ") (json/read-str k))]
[k1 k2 (Long/parseLong (.trim v))]))
(defn co-occurrence
[dir]
(let [source (hfs-textline dir)]
(<- [?k1 ?k2 ?count]
(source ?line)
(parse-row ?line :> ?k1 ?k2 ?count))))
(defn discrim-probs
[dir]
(let [occurrences (co-occurrence dir)
global-total (<- [?total]
(occurrences _ _ ?c)
(c/sum ?c :> ?total))
total (<- [?k1 ?sum]
(occurrences ?k1 _ ?c)
(c/sum ?c :> ?sum))]
(<- [?k1 ?k2 ?prob]
(global-total ?total)
(total ?k1 ?dict-total)
(total ?k2 ?ngram-count)
(occurrences ?k1 ?k2 ?count)
(correlation-prob ?total ?dict-total ?count :> ?prob))))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment