Skip to content

Instantly share code, notes, and snippets.

@jbrechtel
Created July 25, 2009 21:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jbrechtel/154999 to your computer and use it in GitHub Desktop.
Save jbrechtel/154999 to your computer and use it in GitHub Desktop.
require 'rubygems'
require 'bishop'
require 'twitter'
class TwitterClassifier
@@spam_threshold = 0.3
def initialize(knowledge_file=nil)
@bishop = Bishop::Bayes.new
@bishop.load(knowledge_file) if knowledge_file
end
def save(knowledge_file)
@bishop.save(knowledge_file)
end
def get_tweets(user)
Twitter::Search.new.from(user).per_page(50).extend(Enumerable).map do |tweet| tweet.text end
end
def classify(category, user)
tweets = get_tweets(user)
tweets.each do |tweet| @bishop.train(category, tweet) end
end
def how_spammy(user)
tweets = get_tweets(user)
spam_count = 0.0
tweets.each do |tweet|
results = @bishop.guess(tweet)
normal_result = results.select do |result| result[0] == "normal" end.first
spam_result = results.select do |result| result[0] == "spam" end.first
spam_pct = spam_result.nil? ? 0 : spam_result[1]
normal_pct = normal_result.nil? ? 0 : normal_result[1]
result_difference = spam_pct || 0 - normal_pct || 0
if result_difference > @@spam_threshold then spam_count = spam_count + 1 end
end
spam_count / tweets.nitems
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment