Skip to content

Instantly share code, notes, and snippets.

@walterdavis
Last active December 25, 2015 07:18
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save walterdavis/6937915 to your computer and use it in GitHub Desktop.
Save walterdavis/6937915 to your computer and use it in GitHub Desktop.
Post-process a Web site to "tidy" all the pages' HTML.
=begin
Initial Setup (run once at the command prompt, as an admin user).
First, check your environment with
gem env
If you get an error at this point, it is likely you don't have
rubygems installed, and you should do that. The easiest way to
get it all ready to go at once is to install RVM. Instructions
are at http://rvm.io
If you have RVM (and you will know if you do)
gem install nokogiri-pretty
If you don't (just using the System ruby on your Mac)
sudo gem install nokogiri-pretty
Any errors at this point will be instructional in
getting the script to work -- don't ignore them!
To Use:
Call this script like this (from the same
folder where you saved it):
ruby ./tidy_site.rb /path/to/the/site
If your path includes any spaces, be sure to
surround the path with double-quotes:
ruby ./tidy_site.rb "/Users/waltd/Documents/Project Name/Site Folder"
=end
require 'rubygems'
require 'fileutils'
require 'nokogiri-pretty'
base = ARGV[0].chomp
if( ! File.exists? base )
puts 'Missing Site Folder!'
else
Dir.glob( File.join(base, '**/*.html')).each do | page |
doc = Nokogiri::XML(File.read(page))
clean = doc.human.sub('<?xml version="1.0" encoding="UTF-8"?>', '').strip
File.open(page, 'w') {|f| f.write(clean) }
end
puts "#{base} site processed!"
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment