Skip to content

Instantly share code, notes, and snippets.

@lasershow
Last active December 30, 2016 08:24
Show Gist options
  • Save lasershow/387525fadfbe0b4fa001033a475f0f1c to your computer and use it in GitHub Desktop.
Save lasershow/387525fadfbe0b4fa001033a475f0f1c to your computer and use it in GitHub Desktop.
require 'mechanize'
require 'spreadsheet'
# http://qiita.com/shizuma/items/d04facaa732f606f00ff
agent = Mechanize.new
page = agent.get('https://www.tripadvisor.jp/Restaurants-g298158-Chiba_Chiba_Prefecture_Kanto.html')
anker_tags = page.search('.shortSellDetails h3 a')
restaurants_url = []
anker_tags.each do |anker_tag|
pre_url = anker_tag.get_attribute(:href)
url = "https://www.tripadvisor.jp" + pre_url
restaurants_url << url
end
restaurants_info = []
restaurants_url.each do |restaurant_url|
restaurant_info = {}
page = agent.get(restaurant_url)
restaurant_info[:name] = page.search('#HEADING').text
restaurant_info[:address] = page.search('.info div address span span').text
restaurants_info << restaurant_info
end
# テスト用データ
# restaurants_info = [{:name=>"\n\n豆大福本舗木村屋\n", :address=>"〒260-0018 千葉県千葉市中央区院内2丁目13-11 〒260-0018千葉県千葉市千葉市中央区院内2丁目13-11"},
# {:name=>"\n\nトニーローマ 幕張WBG店\n", :address=>"〒261-7101 千葉県千葉市美浜区中瀬2-6WBGマリブダイニング3F 〒261-7101千葉県千葉市千葉市美浜区中瀬2-6WBGマリブダイニング3F"},
# {:name=>"\n\nmister Donut モノレール千葉駅 ショップ\n", :address=>"〒260-0031 千葉県千葉市中央区新千葉1-1-1地先市道路 〒260-0031千葉県千葉市千葉市中央区新千葉1-1-1地先市道路"},
# {:name=>"\n\n八献 幕張新都心店\n", :address=>"〒261-8535 千葉県千葉市美浜区豊砂1-1イオンモール幕張新都心 グランドモール2F 〒261-8535千葉県千葉市千葉市美浜区豊砂1-1イオンモール幕張新都心 グランドモール2F"}]
# http://qiita.com/Kta-M/items/02a2c41c5624f75498aa
# 新規作成
book = Spreadsheet::Workbook.new
sheet = book.create_worksheet(name: 'restaurants')
sheet.row(0).concat %w{レストラン名 住所}
row_number = 1
restaurants_info.each do |restaurant_info|
row = sheet.row(row_number)
row.push restaurant_info[:name]
row.push restaurant_info[:address]
row_number += 1
end
book.write('test.xls')
@lasershow
Copy link
Author

著作権にご注意ください。(スクレイピング)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment