Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save juliend2/625298 to your computer and use it in GitHub Desktop.
Save juliend2/625298 to your computer and use it in GitHub Desktop.
require 'rubygems'
require 'anemone'
load 'global.rb' # contains the skip_links_like regex
# get all the html files that we need
Anemone.crawl("http://codex.wordpress.org/Function_Reference/") do |anemone|
anemone.skip_links_like $regex
anemone.on_pages_like /http\:\/\/codex.wordpress.org\/Function_Reference\// do |page|
puts page.url
matched = page.url.to_s.match /http\:\/\/codex.wordpress.org\/Function_Reference\/([_a-zA-Z0-9]+)$/
if matched
filename = matched[1]
puts filename
doc = Nokogiri::HTML(page.body)
doc.css('#bodyContent').each do |body|
File.open("output/#{filename}.html", 'w') {|f| f.write(body) }
end
end
end
end
# now create one big HTML file
files = Dir.glob 'output/*.html'
files.each do |file|
File.open('bigfile.html', 'a') {|f| f << File.read(file) }
end
# now create the PDF file from this big HTML file
`wkhtmltopdf bigfile.html wordpress_function_reference.pdf --encoding UTF8 --page-size Letter`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment