Skip to content

Instantly share code, notes, and snippets.

@urus
Last active May 13, 2018 22:11
Show Gist options
  • Save urus/4222645 to your computer and use it in GitHub Desktop.
Save urus/4222645 to your computer and use it in GitHub Desktop.
[Ruby]はてブ過去ホッテントリをスクレイピングして、Railsで使っているDBに入れる。殴り書き。 はてなの旧レイアウト(~20130107)用なので、このままでは動かない
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'railsDirectory\config\boot'
require 'railsDirectory\config\environment'
def scrapeHotEntry(date)
puts "scraping... >> " + date.to_s
hotEntryUrl = "http://b.hatena.ne.jp/hotentry/" + date.to_s.delete("-")
hotEntryHtml = Nokogiri::HTML(open(hotEntryUrl), nil, 'UTF-8')
hotEntryHtml.search('//li[@class="track-click-entry"]/div[@class="entry-body"]').each do |doc|
entry = Entry.new
entry.date = date
entry.title = doc.search('h3/a/@title').text
entry.url = doc.search('h3/a/@href').text
entry.count = doc.search('ul[@class="entry-info"]/li[@class="users"]//a').text
entry.category = doc.search('ul[@class="entry-info"]/li[@class="category"]//a').text
tags = doc.search('ul[@class="entry-info"]/li[@class="tags"]')
entry.tag1 = tags.search('a[1]').text
entry.tag2 = tags.search('a[2]').text
entry.tag3 = tags.search('a[3]').text
entry.tag4 = tags.search('a[4]').text
entry.tag5 = tags.search('a[5]').text
entry.save
end
end
puts 'Start!'
# YAML読み込んで、railsのDBと接続 
dbconfig = YAML.load_file('railsDirectory\hoge\config\database.yml')['development']
ActiveRecord::Base.establish_connection(dbconfig)
# 一旦、全レコード削除
Entry.delete_all
#スクレイピング対象の期間を指定
first = Date::new(2012,12,01)
last = Date::new(2012,12,01)
#スクレイピング & DB保存処理
for d in first .. last do
scrapeHotEntry d
end
#終了
puts 'Success!'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment