Skip to content

Instantly share code, notes, and snippets.

@Orangenhain
Created May 28, 2011 15:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Orangenhain/996922 to your computer and use it in GitHub Desktop.
Save Orangenhain/996922 to your computer and use it in GitHub Desktop.
Fetch episodes from ASCIIcasts.com ; use existing PDF, or create one
#!/usr/bin/env ruby -U
# fetch available Railscasts transcripts ( ASCIIcasts ); use existing PDF where available
# original concept: https://github.com/korin/casts/blob/master/asciicasts/fetch.rb
# get wkhtmltopdf from: http://code.google.com/p/wkhtmltopdf/downloads/list
require 'uri'
require 'open-uri'
require 'nokogiri'
require 'pdfkit'
BASE_URL = "http://asciicasts.com"
USER_CSS_FILE = "/tmp/_fetch_asciicasts_print.css"
def shorten (string, count = 30)
return string if string.length <= count
ellipsis = '...'
string[0,count-ellipsis.length] + ellipsis
end
def filename(url, title, episode_number)
File.basename(url) + ".pdf" # could also be: "##{episode_number} #{title}.pdf"
end
def save_pdf(url,dest)
doc = Nokogiri::HTML(open(url))
doc.xpath('//p[@id="otherFormats"]/a').each do |link|
uri = URI.parse(link['href'])
next unless File.extname(uri.path) =~ /\.pdf/
file = open(dest, "wb")
file.write(open(uri.to_s).read)
file.close
return true
end
user_css_fileURL = "file://#{USER_CSS_FILE}"
kit = PDFKit.new(url, :margin_bottom => 0, :margin_left => 0, :margin_right => 0, :margin_top => 0, :orientation => "Landscape", :user_style_sheet => user_css_fileURL)
file = kit.to_file(dest)
return true
end
File.open(USER_CSS_FILE, "w") { |file| file.write(DATA.read) }
doc = Nokogiri::XML(open("#{BASE_URL}/episodes/all"))
doc.css('dl#episodeArchive > dt').each do |item|
episode_number = item.xpath('text()').text # first text node
link = item.css('a').first
url = BASE_URL + link['href']
title = link.text
next_element = item.next_element rescue nil
description = if ( (not next_element.nil?) and next_element.name == 'dd' )
next_element.text
else
nil
end
# TODO: scrape tags, to be set in resulting pdf
file_name = filename(url, title, episode_number)
# dest = File.join(File.dirname(__FILE__), "PDFs", file_name)
dest = File.join(File.dirname(__FILE__), file_name)
next if File.exists?(dest)
puts(format("working on #%03d: %42s => %s", episode_number, shorten(title, 40), dest))
save_pdf(url, dest)
end
__END__
/*
img { page-break-before: avoid; page-break-inside: avoid; }
.dp-highlighter { page-break-before: avoid; page-break-inside: avoid; }
p { page-break-before: avoid; page-break-inside: avoid; }
*/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment