Skip to content

Instantly share code, notes, and snippets.

@koki-h
Created June 8, 2009 12:55
Show Gist options
  • Save koki-h/125799 to your computer and use it in GitHub Desktop.
Save koki-h/125799 to your computer and use it in GitHub Desktop.
#Twitterの過去ログを取ってきてタブ区切りで1ページずつ書き出す
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'date'
MIN_PAGE = ARGV[0]
MAX_PAGE = ARGV[1]
#MAX_PAGE = 2
# Get a Nokogiri::HTML:Document for the page we’re interested in...
MIN_PAGE.upto MAX_PAGE do |i|
id = Time.now.to_i
page = sprintf("%03d", i)
raw_log = open "raw_twitter_log_#{page}_#{id}.html",'w'
formed_log = open "twitter_log_#{page}_#{id}.txt",'w'
begin
puts "Now #{i} page."
raw_doc = open("http://twitter.com/koki_h?page=#{i}")
raw_log.write(raw_doc.read)
raw_doc.rewind
doc = Nokogiri::HTML(raw_doc)
doc.css('#content li').each do |c|
printf formed_log, "%s\t", c.css('.entry-content')[0].content
content_link = c.css('.entry-content a')
printf formed_log, "%s\t", content_link[0]['href'] if content_link && content_link[0]
printf formed_log, "%s\t", c.css('.entry-date')[0]['href']
printf formed_log, "%s\t", c.css('.published')[0].content
printf formed_log, "\n"
end
ensure
raw_log.close
formed_log.close
end
sleep(5)
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment