Skip to content

Instantly share code, notes, and snippets.

@epochwolf
Created October 1, 2013 04:00
Show Gist options
  • Save epochwolf/6773715 to your computer and use it in GitHub Desktop.
Save epochwolf/6773715 to your computer and use it in GitHub Desktop.
A script to extract the text from a local copy of unitysaga.com into a single html file suitable for conversion to an ebook. Never underestimate the lengths to which a nerd will go to get something on their kindle.
#!/usr/bin/env ruby
# A script to extract the text from a copy of unitysaga.com into a single file.
# Copyright (C) 2013 epochwolf
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of
# this software and associated documentation files (the "Software"), to deal in
# the Software without restriction, including without limitation the rights to
# use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
# the Software, and to permit persons to whom the Software is furnished to do so,
# subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
# FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
# COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
# IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
require 'rubygems'
require 'nokogiri'
File.open("output.html", 'w') do |f|
f.write "<html>\n"
f.write "<head>\n"
f.write "<style>\n#{File.read "tools/articlestyle.css"}\n"
f.write ".book, .chapter{ page-break-after: always }\n"
f.write "</style>\n"
f.write "</head>\n"
f.write "<body>\n"
f.write "<div class=\"opening\">\n"
f.write "<h1>The Unity Saga by Charles Sonnenburg</h1>\n"
f.write "<h2>Repackaged by epochwolf</h2>\n"
f.write "<p>\n"
f.write "All copyright claims belong to Charles Sonnenburg.\n"
f.write "This repackage is for epochwolf's personal use and no one elses.\n"
f.write "Never underestimate the lengths to which a nerd will go to get something on their kindle.\n"
f.write "</p>\n"
f.write "</div>\n"
%w[wwe aao boh dof pl sotn].each do |folder|
chapters = []
puts "Processing #{folder}.asp.html"
raw_html = File.read("#{folder}.asp.html")
doc = Nokogiri::HTML::Document.parse(raw_html)
book_title = doc.css("#header h4").inner_html
back_cover = doc.css("#unity").inner_html
Dir["#{folder}/*.html"].each do |file|
puts "Processing #{file}"
raw_html = File.read(file)
doc = Nokogiri::HTML::Document.parse(raw_html)
main_div = doc.css('#main').first
main_div.remove_attribute "id"
main_div["class"] = "chapter"
# Demote chapter titles by one level
main_div.css("h1").each do |node|
node.name = "h2"
end
# Rename chapter title containers
main_div.css("div.vidtitle").each do |node|
node["class"] = "chapter_title"
end
chapters << main_div.to_html
end
f.write "<div class=\"book\">\n"
f.write "<div class=\"book_title\"><h1>#{book_title}</h1></div>\n"
f.write back_cover
f.write "</div>\n"
chapters.each do |chapter|
f.write chapter
f.write "\n\n\n\n"
end
f.write "</div>\n"
end
f.write "</body>\n"
f.write "</html>\n"
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment