Skip to content

Instantly share code, notes, and snippets.

@svyatov
Created November 24, 2012 00:54
Show Gist options
  • Save svyatov/4137877 to your computer and use it in GitHub Desktop.
Save svyatov/4137877 to your computer and use it in GitHub Desktop.
nginx log parser, reports top 10 consumers ip by traffic + gives some additional info (pages, urls hits)
bytes_per_ip = Hash.new(0)
urls_per_ip = Hash.new(0)
pages_per_ip = Hash.new(0)
files_extensions_per_ip = Hash.new()
# i = 0
File.foreach ARGV.shift do |line|
# Line example: [23/Nov/2012:06:26:53 +0400] 123.123.123.123 "GET /images/socializ/google-buzz.png HTTP/1.1" 200 2283 "-" www.domain.ru "http://www.domain.ru/page.html" "Opera/9.80 (J2ME/MIDP; Opera Mini/5.1.24009/28.3126; U; ru) Presto/2.8.119 Version/11.10"
/\A\[.+?\]\s(?<ip>.+?)\s"GET (?<url>.+?)\s.+?"\s[0-9]+\s(?<bytes>[0-9]+)\s/ =~ line
bytes_per_ip[ip] += bytes.to_i
urls_per_ip[ip] += 1
/(?:\.(?<url_file_extension>[a-z]{3,4}))?\z/ =~ url
pages_per_ip[ip] +=1 if url_file_extension == 'html'
if url_file_extension and url_file_extension != 'html'
files_extensions_per_ip[ip] = Hash.new(0) unless files_extensions_per_ip.has_key?(ip)
files_extensions_per_ip[ip][url_file_extension] += 1
end
# i += 1
# break if i > 1000
end
# After sort we got array of [[ip, bytes], [ip, bytes], ...]
bytes_per_ip.sort { |a, b| a[1] <=> b[1] }.reverse.take(10).each do |bpi|
puts "%15s => %5.2f Gb [pages hits: %i; files hits: %i, extensions: %s]" %
[ bpi[0],
bpi[1].to_f/(1024**3), # converting bytes to Gb
pages_per_ip[bpi[0]],
urls_per_ip[bpi[0]],
files_extensions_per_ip[bpi[0]] ]
end
@svyatov
Copy link
Author

svyatov commented Nov 24, 2012

Usage: ruby analyse_nginx_log.rb nginx-access.log

Output example:

 68.200.110.185 => 15.08 Gb [pages hits: 11358; files hits: 166721, extensions: {"jpg"=>145385, "jpeg"=>3139, "php"=>7, "net"=>1, "com"=>3, "xml"=>42, "swf"=>13, "flv"=>1, "gif"=>35, "css"=>1, "ico"=>1, "png"=>12}]
   62.212.76.84 =>  0.59 Gb [pages hits: 0; files hits: 41975, extensions: {"jpg"=>262}]
  217.69.133.69 =>  0.52 Gb [pages hits: 1; files hits: 4800, extensions: {"jpg"=>3590, "png"=>10, "jpeg"=>14, "gif"=>7}]
   89.250.14.40 =>  0.51 Gb [pages hits: 294; files hits: 18117, extensions: {"jpg"=>10359, "gif"=>4486, "png"=>2246, "php"=>530, "jpeg"=>6, "ico"=>11, "css"=>1}]
   66.249.76.10 =>  0.34 Gb [pages hits: 1417; files hits: 6187, extensions: {"jpg"=>3650, "jpeg"=>76, "swf"=>6, "xml"=>4, "txt"=>2, "gif"=>33, "png"=>13, "php"=>1}]
   194.8.130.79 =>  0.33 Gb [pages hits: 315; files hits: 6215, extensions: {"css"=>2, "ico"=>4, "gif"=>318, "jpg"=>5117, "php"=>273, "png"=>90, "jpeg"=>80}]
    2.94.140.87 =>  0.30 Gb [pages hits: 77; files hits: 1894, extensions: {"jpg"=>536, "jpeg"=>1155, "php"=>77, "gif"=>12, "png"=>10}]
 212.178.24.174 =>  0.28 Gb [pages hits: 123; files hits: 4908, extensions: {"css"=>1, "gif"=>104, "jpg"=>4226, "jpeg"=>138, "ico"=>1, "png"=>36, "php"=>100, "cur"=>1}]
  82.151.114.56 =>  0.28 Gb [pages hits: 19; files hits: 652, extensions: {"jpg"=>319, "gif"=>35, "css"=>1, "png"=>14, "ico"=>1, "php"=>18, "jpeg"=>241}]
   83.68.37.106 =>  0.28 Gb [pages hits: 237; files hits: 4784, extensions: {"php"=>224, "jpg"=>3863, "gif"=>24, "png"=>44, "css"=>1, "jpeg"=>345}]

3.2 millions of lines is parsed in 55 seconds. Any suggestions how to improve?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment