Skip to content

Instantly share code, notes, and snippets.

@yuya-matsushima
Created March 21, 2012 13:00
Show Gist options
  • Save yuya-matsushima/2146747 to your computer and use it in GitHub Desktop.
Save yuya-matsushima/2146747 to your computer and use it in GitHub Desktop.
中之条議会だよりを掲載サイトから取得。数が多いので時間掛かる
#encoding: utf-8
require 'nokogiri'
require 'open-uri'
url = 'http://www.town.nakanojo.gunma.jp/gikai/gikaidayori/top.html'
base = 'http://www.town.nakanojo.gunma.jp/gikai/gikaidayori/'
name = '中之条議会だより'
html = Nokogiri::HTML(open(url))
html.css('a').each do |pdf|
url = pdf.attribute('href')
if (/\.pdf$/ =~ url)
# 全角→半角変換を行いながらタイトル取得
title = pdf.attribute('href').to_s.sub('.pdf', '')
open(title + name + '.pdf', 'wb') do |output|
open(base + url) do |data|
output.write(data.read)
end
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment