Skip to content

Instantly share code, notes, and snippets.

@nuna
Created December 22, 2010 01:39
Show Gist options
  • Save nuna/750938 to your computer and use it in GitHub Desktop.
Save nuna/750938 to your computer and use it in GitHub Desktop.
insert apache log to mongodb
#!/usr/bin/ruby19
# -*- coding: utf-8 -*-
require 'apachelogregex'
require 'uri'
require 'mongo'
format = '%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\" \"%{Cookie}i\"'
parser = ApacheLogRegex.new(format)
datetime_pat = /\A\[(\d{2})\/(\w{3})\/(\d{4}):(\d{2}):(\d{2}):(\d{2}) \+0900\]\z/
connection = Mongo::Connection.new
db = connection.db('apache-log')
logs = db.collection('log')
discarded_lines = 0
ARGF.each do |line|
unless log = parser.parse(line.chomp)
discarded_lines += 1
next
end
next unless log['%>s'].to_s == '200'
next unless log['%r'] =~ /^GET/i
matched = datetime_pat.match(log['%t'])
next unless matched
time = Time.local(matched[3], matched[2], matched[1], matched[4], matched[5], matched[6])
bytes = log['%b'].to_i
begin
uri = URI.parse(URI.escape(log['%r'].sub(/^GET /, 'http:/'))).normalize.to_s
rescue
next
end
h = { :time => time, :bytes => bytes, :uri => uri }
logs.insert(h)
end
STDERR.puts "discarded lines: #{discarded_lines}"
STDERR.puts 'done'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment