Skip to content

Instantly share code, notes, and snippets.

@hotchpotch
Created September 18, 2020 23:43
Embed
What would you like to do?
カメラバカにつける薬 in デジカメ Watch を PDF 化する
#!/usr/bin/env ruby
# requirement
# - nokogiri (rubygems)
# - img2pdf
# - pngquant (optional)
require 'pathname'
require 'open-uri'
require 'nokogiri'
require 'time'
ROOT_PATH = Pathname.new(__FILE__).parent
IMAGE_CACHE_PATH = ROOT_PATH.join('image_caches')
IMAGE_CACHE_PATH.mkpath
INDEX_RANGE = 2017..(Time.now.year)
SLEEP_WAIT = 1
def get_entry_urls(index_range)
path_re = %r{dc.watch.impress.co.jp/docs/comic/clinic/\d+.html}
index_range.map do |year|
index_url = "https://dc.watch.impress.co.jp/docs/comic/clinic/index#{year}.html"
body = open(index_url).read
links = Nokogiri(body).css('a').map {|e| e.attr('href') }
links.find_all {|url| url.match(path_re) }
end.flatten
end
entry_urls = get_entry_urls(INDEX_RANGE).flatten.sort.uniq
entry_urls.each do |url|
puts "process: #{url}"
_, path1, path2 = *url.match(/(\d{4})(\d{3})\.html/)
(1..99).each do |i|
cache_path = IMAGE_CACHE_PATH.join "#{path1}_#{path2}_%02d.png" % i
image_url = "https://dc.watch.impress.co.jp/img/dcw/docs/#{path1}/#{path2}/%02d.png" % i
if cache_path.exist?
puts "cache found!"
break
else
puts "get image: #{image_url}"
begin
image = open(image_url).read
cache_path.open('w') {|f| f.puts image }
rescue OpenURI::HTTPError => e
warn "image can't load, next"
break
end
sleep SLEEP_WAIT
end
end
end
image_files = Pathname.glob('./image_caches/*.png').map(&:to_s).sort
pngquant_cmd_base = %w(pngquant --ext .png --force)
pngquant_cmd = [pngquant_cmd_base + image_files].flatten
puts pngquant_cmd_base.join(' ') + " [...#{image_files.size} png files]"
system *pngquant_cmd
pdfname = Time.now.strftime('camera-baka-%Y_%m%d_%H%M%S') + ".pdf"
pdf_cmd_base = ['img2pdf', '--output', "./dist/#{pdfname}"]
pdf_cmd = [pdf_cmd_base + image_files].flatten
puts pdf_cmd_base.join(' ') + " [...#{image_files.size} png files]"
system *pdf_cmd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment