Skip to content

Instantly share code, notes, and snippets.

@nebuta
Created November 8, 2011 07:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nebuta/1347208 to your computer and use it in GitHub Desktop.
Save nebuta/1347208 to your computer and use it in GitHub Desktop.
Get google search result
require 'net/http'
require 'cgi'
require 'rubygems'
require 'hpricot'
require 'open-uri'
require "resolv-replace"
require 'timeout'
BASE_URL = "http://www.google.com/search?"
LANG = "ja"
TARGET_LANG = "lang_ja"
CHAR_SET = "utf-8"
def filename
count = 1
while File.exist? "web#{count}.html"
count += 1
end
"web#{count}.html"
end
Dir::chdir('web')
list = Array.new
if !ARGV[0]
puts "You need a query parameter for Google."
exit
end
(0..9).each do |i|
query = ARGV[0]
resultNum = "10"
startPage = (i*10).to_s
parameters = {:hr=>LANG,
:lr => TARGET_LANG,
:ie => CHAR_SET,
:oe => CHAR_SET,
:num => resultNum,
:start => startPage,
:q => query
}
paramString = (
parameters.collect { |key,value| "#{key}=#{CGI::escape(value)}" }
).join('&')
uri = URI.parse( BASE_URL + paramString )
puts uri
doc = Hpricot( open(uri) )
l = (doc/"h3.r a").map{|e|
if e['href'].index('http://')==0 then e['href']
else nil end
}.compact
list.concat l
end
puts
list.each{|url|
begin
timeout(10){
open(url){|input|
puts url
open(filename,"w"){|out|
out.write input.read
}
}
}
rescue
end
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment