Skip to content

Instantly share code, notes, and snippets.

@kimoto
Created May 12, 2015 09:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kimoto/2297ce83e07092715dfb to your computer and use it in GitHub Desktop.
Save kimoto/2297ce83e07092715dfb to your computer and use it in GitHub Desktop.
CSResulter to CSV
#!/bin/env ruby
# encoding: utf-8
require 'nokogiri'
require 'net/http'
MEMBER_KEY="メンバー"
DAY_KEY="日付"
def crawl(url)
doc = Nokogiri::HTML(Net::HTTP.get(URI(url)))
headers = doc.search("#result_table > thead > tr > th").map(&:text)
records = doc.search("#result_table > tbody > tr")
return records.map{ |record|
data = {}
record.search("td").each_with_index{ |columns, i|
data[ headers[i] ] = columns
}
after = {}
data = data.each{ |key, value|
converted = if key == MEMBER_KEY
data[key].search("a").map(&:text).join(',')
else
data[key].text
end
after[key] = converted
}
data = after
next if data[DAY_KEY] == '計'
data
}.compact
end
clan_id=ARGV.shift or raise ArgumentError.new("failed!")
TARGET_URL="http://csr.no-eta.net/%clan_id%/?page=%page_number%"
SEPERATOR = "\t"
LINE_FEED = "\n"
page_number = 1
all_results = []
while true
url = TARGET_URL.gsub(/%.+?%/, {
"%page_number%" => page_number,
"%clan_id%" => clan_id,
})
STDERR.puts "url: #{url}"
results = crawl(url)
STDERR.puts "fetched records: #{results.size}"
if results.empty?
STDERR.puts "record not found. propably end page."
break
end
all_results += results
page_number += 1
end
field_keys = all_results.map(&:keys).flatten.uniq
puts field_keys.join(SEPERATOR)
puts all_results.map{ |result|
field_keys.map{ |key| result[key] }.join(SEPERATOR)
}.join(LINE_FEED)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment