Skip to content

Instantly share code, notes, and snippets.

@mixonic
Created February 29, 2012 20:10
Show Gist options
  • Save mixonic/1944060 to your computer and use it in GitHub Desktop.
Save mixonic/1944060 to your computer and use it in GitHub Desktop.
Rack::SpellCheck
=begin
Rack::SpellCheck - Spell check your HTML pages with Aspell, Nokogiri, and Rack.
This probably should be loaded in an initializer like so:
if Rails.env.development?
SpintoApp::Application.config.middleware.use Rack::SpellCheck
end
And you should add Nokogiri and Raspell to your Gemfile:
group :development do
gem 'nokogiri'
gem 'raspell'
end
It creates log entries like:
Started GET "/" for 127.0.0.1 at 2012-02-29 15:05:34 -0500
Processing by Main::DashboardController#show as HTML
Rendered main/dashboard/show.erb within layouts/marketing (5.0ms)
Rendered layouts/_header.erb (31.3ms)
Rendered layouts/_footer.erb (0.6ms)
Completed 200 OK in 81ms (Views: 80.5ms)
SpellCheck [workflow]: work flow, work-flow, workfare, workforce, workable, forkful
SpellCheck [walkthrough]: walk through, walk-through, breakthrough, walkabout, Valkyrie, Walker
SpellCheck [frontmatter]: front matter, front-matter, frontward, antimatter, fronted, frontier
SpellCheck [subdomain]: sub domain, sub-domain, subhuman, subliming, subsuming, sideman
SpellCheck [dns]: Dons, dens, dins, dons, duns, DNA
SpellChecked in 0.244784 seconds.
=end
class Rack::SpellCheck
def initialize app
@app = app
@speller = Aspell.new("en_US")
@speller.suggestion_mode = Aspell::NORMAL
@misspellings = {}
@whitelist = %w{
Matt Beale Spinto Spinto's spinto
matt beale www
css js scss CSS SCSS SASS CoffeeScript coffeescript
li td GitHub pre YAML CNAME
yoursubdomain
}
end
def call env
@app.call(env).tap do |response|
begin
if response[1]["Content-Type"] =~ /html/
spell_check response[2].body
end
rescue StandardError => e
Rails.logger.warn "SpellCheck failed: #{e.message}"
end
end
end
def spell_check body
started_at = Time.now
dom = (if body =~ /<body/
Nokogiri::HTML.parse( body )
else
Nokogiri::HTML.fragment( body )
end)
reported_words = []
dom.xpath('//*').each do |node|
next unless node.text.present?
node.text.
# Strip out URLs.
gsub(%r{[a-zA-Z0-9\.:/]+\.(?:co|net|org)[a-zA-Z0-9\.:/?&%]+}, '').
# For each word.
scan(%r{[A-Za-z\u2019'&;]+}) do |word|
# Change HTML escaped and UTF-8 apostrophes to single quotes.
word.gsub!(%r{\u2019|&rsquo;}, "'")
key = word.downcase
next if @whitelist.include?(word) || reported_words.include?(key)
reported_words << key
check_word word
end
end
Rails.logger.info "SpellChecked in #{(Time.now-started_at).seconds} seconds."
end
def check_word word
key = word.downcase
if @misspellings.has_key?(key) && @misspellings[key][:suggestions]
log_misspelling word, @misspellings[key][:suggestions]
else
if @speller.check(word)
@misspellings[key] = { checked: true }
else
@misspellings[key] = { checked: true, suggestions: @speller.suggest(word) }
check_word word
end
end
end
def log_misspelling word, suggestions
Rails.logger.warn "SpellCheck [#{word}]: #{suggestions[0..5].join ', '}"
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment