Skip to content

Instantly share code, notes, and snippets.

@kaixiang-li
Created January 29, 2012 09:41
Show Gist options
  • Save kaixiang-li/1698053 to your computer and use it in GitHub Desktop.
Save kaixiang-li/1698053 to your computer and use it in GitHub Desktop.
博客下载脚本
#觉得http://www.geekonomics10000.com/的内容都很不错,写了个脚本分类把文章抓起来做成txt,可以放到手机当电子书看
require "rubygems"
require "open-uri"
require 'hpricot'
blog = File.new("blog.txt","w")
%w{self_develop books movies china conventional_wisdom politics science
joking us microtrends pop_science social_atom tech critics}.each do |category|
puts "downloading category #{category}"
blog.puts category
page = 1
begin
target_url = "http://www.geekonomics10000.com/category/#{category}/page/#{page}"
doc = Hpricot(open(target_url))
doc.search("h3.post-title/a").each do |link|
puts "downloding #{link.inner_html}"
blog.puts link.inner_html
post = Hpricot(open(link.attributes["href"]))
post.search("div.post-content/p").each do |content|
blog.puts content.inner_html
end
end
page += 1
end while doc.search("div.alignleft").inner_html != ""
10.times {puts ""}
end
blog.close
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment