Skip to content

Instantly share code, notes, and snippets.

@miyataka
Created October 18, 2018 06:16
Show Gist options
  • Save miyataka/bd901cd601f6846420835faf0685d377 to your computer and use it in GitHub Desktop.
Save miyataka/bd901cd601f6846420835faf0685d377 to your computer and use it in GitHub Desktop.
scraping memo
source 'https://rubygems.org'
gem 'mechanize'
gem 'csv'
require 'nokogiri'
require 'open-uri'
require 'csv'
## twitter
#url = 'https://twitter.com/fujiwara_coltd'
#url = 'https://twitter.com/ryuji_fujimura'
#charset = nil
#
#html = open(url) do |f|
# charset = f.charset
# f.read
#end
#
#doc = Nokogiri::HTML.parse(html, nil, charset)
#doc.xpath('//div[@class="ProfileHeaderCard"]').each do |node|
# p node.xpath('h1[@class="ProfileHeaderCard-name"]/a').text
# p node.xpath('h2[@class="ProfileHeaderCard-screenname u-inlineBlock u-dir"]/a/span/b').text
# p node.xpath('div[@class="ProfileHeaderCard-location"]/a/span/b').text
# #p node.css('a').inner_text
#end
## tabelog
#url = 'https://tabelog.com/tokyo/A1307/A130701/13005298/' # GONPACHI
url = 'https://tabelog.com/tokyo/A1310/A131002/13004170' # CARP
charset = nil
html = open(url) do |f|
charset = f.charset
f.read
end
doc = Nokogiri::HTML.parse(html, nil, charset)
table = doc.xpath('//div[@class="rstinfo-table"]/table')
shop_name_node = table.xpath('.//tbody/tr').first
#p shop_name_node
shop_name = shop_name_node.xpath('td').text
#p shop_name.gsub(/\n| /,'')
p shop_name.gsub(/ |\n/,'')
shop_address_node = table.xpath('.//tbody/tr/td/p[@class="rstinfo-table__address"]')
#p shop_address_node
shop_address = shop_address_node.xpath('span').text
p shop_address
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment