Skip to content

Instantly share code, notes, and snippets.

@easonhan007
Created August 7, 2013 13:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save easonhan007/6173875 to your computer and use it in GitHub Desktop.
Save easonhan007/6173875 to your computer and use it in GitHub Desktop.
将天涯易读的帖子中的内容抓出来并打印
#encoding: utf-8
# 文件名 tuoshui.rb
# 将天涯易读的帖子中的内容抓出来并打印
# 使用方法:
# ruby tuoshui.rb [天涯易读帖子id] > result.txt
# 如果没有在运行脚本时指定帖子id的话,默认id为40489
require 'watir-webdriver'
def build_url(id)
sprintf('http://www.tianyayidu.cc/article-a-%d-%%d.html', id)
end
id = 40489
id = ARGV.first.nil? ? id : ARGV.first.to_i
url = build_url(id)
puts url
def page(index, url)
sprintf(url, index)
end
b = Watir::Browser.new :chrome
b.goto page(1, url)
page_text = b.div(:class, 'pageNum1').text
m = page_text.match(/(\d+)/)
page = m ? m[1] : 10
page = page.to_i
(1..page).each do |p|
b.goto page(p, url)
b.lis(:class, 'at c h2').each {|li| puts li.text}
end
b.quit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment