Skip to content

Instantly share code, notes, and snippets.

@purcell
Created November 24, 2009 16:21
Show Gist options
  • Save purcell/241989 to your computer and use it in GitHub Desktop.
Save purcell/241989 to your computer and use it in GitHub Desktop.
Hacky script to count words in RSS feeds
#!/usr/bin/env ruby
require 'rubygems'
require 'hpricot'
require 'open-uri'
class String
def strip_tags
gsub(/&lt;/, '<').gsub(/&gt;/, '>').gsub(/<.*?>/, ' ').gsub(/&amp;#\d{4};/, '~')
end
end
class Hpricot::Elem
def inner_word_count
@inner_word_count ||= inner_html.scan(/\w+/).size
end
end
module Enumerable
def sum;
inject(0) { |sum, val| sum + val }
end
end
uri = ARGV.first
doc = open(uri) { |f| Hpricot(f) }
(doc / 'item').each do |i|
puts "#{(i / 'description').first.inner_word_count}: #{(i / 'title').inner_text}"
end
descrs = (doc / 'item' / 'description')
abort "No RSS item descriptions found" unless descrs.any?
sum = descrs.map { |d| d.inner_word_count }.sum
puts "\nAverage word count: #{sum / descrs.size.to_f}"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment