Skip to content

Instantly share code, notes, and snippets.

@venj
Created January 25, 2011 01:45
Show Gist options
  • Save venj/794365 to your computer and use it in GitHub Desktop.
Save venj/794365 to your computer and use it in GitHub Desktop.
helper script to convert html to utf-8 xhtml
require 'fileutils'
exit if ARGV[0].nil?
FileUtils.cd ARGV[0] do |variable|
# html 2 xhtml
FileUtils.cd "OEBPS" do
Dir["*.html"].each do |file|
system("tidy -asxhtml --output-encoding utf8 --indent 1 --escape-cdata 1 --drop-proprietary-attributes 1 --drop-font-tags 1 #{file} > #{file}.xml 2> /dev/null")
#system("html2xhtml -t 1.1 #{file}.xml -o #{file}")
system("rm #{file}")
system("mv #{file}.xml #{file}")
end
end
# change to utf-8
FileUtils.cd "OEBPS" do
Dir["*.html"].each do |file|
html = open(file)
content = html.read
content.sub!("us-ascii", "UTF-8")
content.sub!(/<!DOCTYPE.*?>.*?/m, "")
str =<<END
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd" >
END
content = str + content
html.reopen(file, "w")
html.puts content
html.close
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment