Skip to content

Instantly share code, notes, and snippets.

@yalab
Last active June 28, 2019 23:10
Show Gist options
  • Save yalab/1e6fa89b01584f3b8bb918ed03bd53d5 to your computer and use it in GitHub Desktop.
Save yalab/1e6fa89b01584f3b8bb918ed03bd53d5 to your computer and use it in GitHub Desktop.
require 'open-uri'
require 'nokogiri'
ROOT_URI = 'https://www.google.com/search?source=hp&ei=K5YWXf3qBozK8wXsgqOYDQ&q=%E8%84%86%E5%BC%B1%E6%80%A7&oq=%E8%84%86%E5%BC%B1%E6%80%A7&gs_l=psy-ab.3..0j0i131j0l6.4101.7361..7805...2.0..0.74.1039.17......0....1..gws-wiz.....0..0i4j0i131i4j0i4i70i257.JdDB9CrU7j0'
google_html = nil
open(ROOT_URI) do |f|
google_html = f.read
end
google = Nokogiri::HTML(google_html)
REG_DOUBLE_SLASH = %r(\A//)
REG_SLASH = %r(\A/)
google.xpath('//a').each do |anchor|
body = nil
uri = if anchor[:href][0].match(REG_DOUBLE_SLASH)
"https://google.com#{anchor[:href].gsub('//', '/')}"
elsif anchor[:href][0].match(REG_SLASH)
"https://google.com#{anchor[:href]}"
else
anchor[:href]
end
next unless uri
open(uri) do |f|
body = f.read
end
html = Nokogiri::HTML(body)
# ここで欲しい情報をいろいろ取ってくる処理
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment