Skip to content

Instantly share code, notes, and snippets.

@plagi
Forked from mischa/ledes.rb
Created June 26, 2012 14:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save plagi/2995972 to your computer and use it in GitHub Desktop.
Save plagi/2995972 to your computer and use it in GitHub Desktop.
require 'rubygems'
require 'stemmer'
require 'classifier'
class LedeClassifier
attr :classifier
def initialize(sections, n)
@classifier = Classifier::Bayes.new(*sections)
sections.each{|s| train_with(training_wheels(s, n), s)}
end
def train_with(wheels, s)
s = s.downcase
wheels.each {|lede| @classifier.send("train_#{s}", lede)}
end
def training_wheels(section, n)
Page.find_by_name(section).posts.find(:all, :order => "id asc", :limit => n).map(&:teaser)
end
end
# tests
sections = ["News", "Viewpoints", "Voices", "Sports"]
@@classifier = LedeClassifier.new(sections, 3000).classifier
@@output = ""
def test_section(name, n)
@@output << "-" * 80 + "\n" + name.upcase + "\n"
teasers = Page.find_by_name(name).posts.find(:all, :order => "id desc", :limit => n).map(&:teaser)
teasers.each do |t|
test_lede_classification(t)
end
end
def test_lede_classification(teaser)
@@output << @@classifier.classify(teaser) + "\n"
end
sections.each{|s| test_section(s, 10)}
puts @@output
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment