Skip to content

Instantly share code, notes, and snippets.

@mazuhl
Created February 1, 2010 16:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mazuhl/291783 to your computer and use it in GitHub Desktop.
Save mazuhl/291783 to your computer and use it in GitHub Desktop.
HTML cleaner for quick launcher - works with Windows clipboard
require 'rubygems'
require 'Win32API'
require 'win32/clipboard'
include Win32
text = Clipboard.data
=begin
= Ruby script to remove crummy HTML
A simple, but effective script to remove crummy HTML (particularly from Word) that is useful for site editors, intranet managers, etc. Copy the text to your clipboard, run cleanup.rb (maybe from a quick launcher like Launchy) and then paste the text back in.
* remove style, class, type, align, width, height, originalpath, originalattribute and valign attributes
* strip o:p, div, font, span, sup, body, html, title, head, meta
* script, style tags
* strip <?xml tag
* strip !DOCTYPE declaration
* strip Word place XML <st1:>
* strip empty paragraphs
* strip <p><b></b></p> tag clusters
* strip <u> tags
=end
text = text.gsub(/\s(class|type|align|style|width|height|valign|originalpath|originalattribute)=(['"](.*?)['"]|[^ >]*)/i,"").
gsub(/<\/?(div|o:p|font|span|sup|body|html|title|head|script|style|meta)[^>]*>(&nbsp;)*/i,"").
gsub(/<\?xml(.*?)>/i,"").
gsub(/<!DOCTYPE[^>]*>/i,"").
gsub(/<\/?st1[^>]*>/i,"").
gsub(/<p[^>]*>(&nbsp;)*<\/p>/i,"").
gsub(/<p><b><\/b><\/p>/i,"").
gsub(/<\/*u>/i,"")
Clipboard.set_data(text)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment