Skip to content

Instantly share code, notes, and snippets.

@caindy
Created April 17, 2012 20:09
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save caindy/2408682 to your computer and use it in GitHub Desktop.
Save caindy/2408682 to your computer and use it in GitHub Desktop.
quick hack to download some course materials
require 'hpricot'
require 'open-uri'
require 'parallel'
overall_start_time = Time.now
uri = "http://www.stanford.edu/class/cs193p/cgi-bin/drupal/downloads-2010-fall"
puts "Downloading #{uri}"
doc = Hpricot(open(uri))
pdfs = doc/"//a[@href*=pdf]"
Dir.mkdir('download') unless File.directory?('download')
Dir.chdir('download')
Parallel.map(pdfs, :in_threads=>8) do |e|
fileName = e.inner_html
startTime = Time.now
puts "Downloading #{fileName}"
open(fileName, 'wb') do |file|
file << open(e['href']).read
end
elapsedTime = Time.now - startTime
puts "#{fileName} downloaded in #{elapsedTime} seconds"
end
puts "Finished in #{Time.now - overall_start_time}"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment