Skip to content

Instantly share code, notes, and snippets.

@raecoo
Created November 8, 2011 09:20
Show Gist options
  • Save raecoo/1347351 to your computer and use it in GitHub Desktop.
Save raecoo/1347351 to your computer and use it in GitHub Desktop.
gmusic spider
#!/usr/bin/env ruby
require 'rubygems'
require 'nokogiri'
require 'mechanize'
require 'cgi'
a = Mechanize.new { |agent|
agent.user_agent_alias = 'Mac Safari'
}
# puts "What is your name?"
# $name = STDIN.gets
# puts "Hi "+$name
def log(str)
system("echo #{str} >> downloads.log")
end
query = "http://www.google.cn/music/search?q=#{CGI::escape(ARGV.first)}&aq=f"
target_page = a.get(query)
doc = Nokogiri::HTML(target_page.parser.to_s)
nodes = doc.xpath('//div[@class="results"]/table[@id="song_list"]/tr')
song_ids = []
puts "-- found #{nodes.size} nodes"
log("=--- #{Time.now.strftime('%Y-%m-%d %H:%M')} found #{nodes.size} target node---=")
log("- Starting...")
nodes.each { |song| song_ids << (song.attribute('id').to_s.gsub('row','')) }
song_ids.each_with_index do |sid,index|
target = "http://www.google.cn/music/top100/musicdownload\?id=#{sid}"
log("- Download sort : #{index+1}")
log("- Download page : #{target}")
down_base = a.get(target)
down_base.links_with(:href => %r{/music/}).each do |link|
next if link.href !~ %r{/music/}
down = link.href.gsub('/music/top100/url?q=','').split('&').first
file_name = CGI.unescape(down).split('/').last
unescape_down = CGI.unescape(down).gsub(/\(/,'\(').gsub(/\)/,'\)')
unescape_file_name = CGI.unescape(file_name).gsub(/\s/,'-').gsub(/\(/,'').gsub(/\)/,'')
unescape_file_name = unescape_file_name.gsub(/\'/,'')
log("- File name : #{unescape_file_name}")
log("- File url : #{unescape_down}")
system("curl -o #{unescape_file_name} #{unescape_down}")
puts "---> curl -o #{unescape_file_name} #{unescape_down}"
end
end
log("- Ending...")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment