Skip to content

Instantly share code, notes, and snippets.

@5kg
Created June 27, 2013 16:16
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save 5kg/5877845 to your computer and use it in GitHub Desktop.
Save 5kg/5877845 to your computer and use it in GitHub Desktop.
spam_train, ham_train, spam_test, ham_test = ['train/spam', 'train/ham', 'test/spam', 'test/ham'].map{|t| Dir["#{ARGV[0]}/#{t}/*"].map {|fn| File.open(fn, 'r:iso8859-1').read.gsub(/[^a-zA-Z]/, ' ').split}}
spam_log, ham_log = [spam_train, ham_train].map{|t| t.flatten.instance_eval {reduce(Hash.new(0)) { |h,v| h[v] += 1.0/size; h }.select{|w, v| w.size > 2 && v > 8e-6}}.instance_eval{each {|k,v| self[k] = Math.log(v)}}}
spam_predict, ham_predict = [spam_test, ham_test].map {|t| t.map{|d| [spam_log, ham_log].map {|log| d.reduce(0){|s, w| log[w] ? s + log[w] : s}}}}
p spam_predict.size, spam_predict.select{|e| e.first < e.last}.size
p ham_predict.size, ham_predict.select{|e| e.first > e.last}.size
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment