Skip to content

Instantly share code, notes, and snippets.

@mac389
Created August 28, 2013 21:42
Show Gist options
  • Save mac389/6371712 to your computer and use it in GitHub Desktop.
Save mac389/6371712 to your computer and use it in GitHub Desktop.
Better NB classifier that uses our curated list of stopwords
require 'rubygems'
require 'stuff-classifier'
require 'spreadsheet'
toxifier = StuffClassifier::TfIdf.new("tox")
rating = {'0' => :no, '1' => :yes, '2' =>:maybe}
toxifier.ignore_words = File.readlines('stopwords')
curated_data = File.readlines('alcohol_MC.txt')
curated_data.each do |spread|
book = Spreadsheet.open spread;
sheet1 = book.worksheet 0
sheet1.each do |row|
if !row[1].nil?
toxifier.train(rating[row[1].to_i.to_s],row[0])
end
end
end
puts toxifier.classify(' being able to talk to my manager about boys and alcohol')
puts toxifier.classify(' 80party party awesomeness citylife ohshit alcohol drinking smile http t co tq6pgur2qn')
puts toxifier.classify(' i think alcohol is still in my system')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment