Skip to content

Instantly share code, notes, and snippets.

@YuheiNakasaka
Created April 17, 2018 02:33
Show Gist options
  • Save YuheiNakasaka/59673982f82d8c01df4113b3d6ce31a7 to your computer and use it in GitHub Desktop.
Save YuheiNakasaka/59673982f82d8c01df4113b3d6ce31a7 to your computer and use it in GitHub Desktop.
イラク日報のpdfを全部ダウンロードするrubyスクリプト
require "nokogiri"
require "open-uri"
require 'fileutils'
# create directory to download pdfs
download_dir = "/home/hoge/iraq-nippo-list"
if !File.exists?(download_dir)
FileUtils::mkdir_p(download_dir)
end
# get pdf links
html = open("https://www.asahi.com/articles/ASL4J669JL4JUEHF016.html").read
doc = Nokogiri.HTML(html)
links = doc.css("td.link a").map do |link|
link.attribute('href').value
end
# download pdf and save file
links.each do |link|
pdf_text = open(link).read
filename = link.sub(%r!https://www.asahicom.jp/news/esi/ichikijiatesi/!, '').gsub(/\//, '_')
File.open("#{download_dir}/#{filename}", "wb") do |file|
file.write(pdf_text)
end
end
@YuheiNakasaka
Copy link
Author

Ruby使える人ならNokogiriくらい入ってるだろうというあれです

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment