Skip to content

Instantly share code, notes, and snippets.

@dpk
Created January 9, 2012 14:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dpk/1583206 to your computer and use it in GitHub Desktop.
Save dpk/1583206 to your computer and use it in GitHub Desktop.
Bayesian text filter in Plan.
(set spam-words* {}) (tag spam-words* 'default 0) ; todo: implement :->tagged
(set good-words* {}) (tag good-words* 'default 0)
(deffn (tokens body)
(map symbol (regexp-split body /\W+/)))
(deffn (bayes &fs)
(with (pfs (apply * fs) ifs (map (fn (x) (- 1 x)) fs))
(/ pfs (+ pfs (apply * ifs)))))
(deffn (word-prob word)
(with (sscore (spam-words* word) hscore (good-words* word))
(if (or (include? spam-words* word) (include? good-words* word))
(ensure-range 0.001 (/ sscore (+ sscore hscore)) 0.999)
0.5)))
(deffn (train s/h text) ;s/h == spam/ham?
(with (words (tokens text)
dict (if (= s/h 'spam) spam-words* good-words*))
(each word words
(set (dict word) (+ (dict word) 1)))))
(deffn (spam-or-ham? text)
(with (scored-words (map (fn (w) (list w (word-prob w))) (uniques (tokens text)))
ranked-words (sort (comparative > cadr) scored-words))
(let msg-prob (apply bayes (map cadr (ranked-words 0 15)))
(if (> msg-prob 0.6)
`(spam ,msg-prob)
`(ham ,msg-prob)))))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment