Skip to content

Instantly share code, notes, and snippets.

@sasamijp
Created September 28, 2014 11:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sasamijp/b9c1f8a5f3f554060f1d to your computer and use it in GitHub Desktop.
Save sasamijp/b9c1f8a5f3f554060f1d to your computer and use it in GitHub Desktop.
アーカイブのurl全部取得してdbに書き込む
# -*- encoding: utf-8 -*-
require 'nokogiri'
require "open-uri"
require 'sequel'
def insert(dbname, urls)
db = Sequel.connect("sqlite://#{dbname}")
urls.each do |url|
db[:url].insert(value: url)
end
end
def get_archive(url)
ret = []
charset = nil
html = open(url) do |f|
charset = f.charset
f.read
end
doc = Nokogiri::HTML.parse(html, nil, charset)
doc.xpath('//h1[@class="article-title"]/a').each do |v|
ret << v.attribute('href').value
end
ret
end
for l in 59..100 do
insert('url.db', get_archive("http://ankake.blog.jp/?p=#{l}"))
end
#insert('url.db', 'http://ankake.blog.jp/archives/1008878299.html')
#puts get_archive('http://ankake.blog.jp/?p=30')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment