Skip to content

Instantly share code, notes, and snippets.

@ser1zw
Created June 8, 2010 16:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ser1zw/430241 to your computer and use it in GitHub Desktop.
Save ser1zw/430241 to your computer and use it in GitHub Desktop.
## Download the pdf files of Ubuntu Magazine Japanese vol.3
## (URL http://ubuntu.asciimw.jp/elem/000/000/010/10231/)
require 'open-uri'
BASE_URL = 'http://ubuntu.asciimw.jp/elem/000/000/010/10231/'
PDF_URL_REG = /href="(koukai\/.+?.pdf)"/
OUT_DIR = 'ubumaga_vol3'
html_body = ''
open(BASE_URL, 'r') { |f| html_body = f.read }
puts "#{BASE_URL} >> DOWNLOADED."
Dir::mkdir(OUT_DIR) unless File.exist?(OUT_DIR)
html_body.scan(PDF_URL_REG).flatten.each { |pdf_url|
filename = File.basename(pdf_url)
pdf = nil
print "[#{BASE_URL + pdf_url}] START >> "
open(BASE_URL + pdf_url, 'rb') { |f| pdf = f.read }
print "DOWNLOADED >> "
open(OUT_DIR + '/' + filename, 'wb') { |f| f.write pdf }
puts "DONE."
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment