Skip to content

Instantly share code, notes, and snippets.

@dominictarr
Created December 2, 2010 00:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dominictarr/724521 to your computer and use it in GitHub Desktop.
Save dominictarr/724521 to your computer and use it in GitHub Desktop.
parse the monthy stats from awstats.***.tld
#! ruby
require "eregex"
@patterns = Hash.new
@keys = []
def path(dir, group=nil)
if (dir.is_a? String) then
dir = Regexp.escape(dir)
key = Regexp.new("^\/#{dir}\/?.*")
else
key = dir
end
@keys << key
if(group == nil) then
@patterns.store(key,dir)
else
@patterns.store(key,Regexp.escape(group))
end
end
#summarise columns by grouping rows.
#patterns and groups are stored in a hash:
#pattern maps to group.
#usually, the group is just the path.
#there are some special cases
#special cases are defined here:
#path("path", "group to summarize into")
#path(
path("a/cam-era","our-services/online-services/cam-era")
path("news-and-publications/publications/all/wa","news-and-publications/publications/water_and_atmosphere")
path("__data","DISCARD")
path("our-services/online-services/tides/tides","our-services/online-services/tides")
["news-and-publications/news",
"news-and-publications/publications",#seperate water & atmospehere from the rest of publications
"news-and-publications",
"education-and-training",
"our-services/online-services/satellite-data-services",
"our-services/instruments",
"our-science/climate",
"our-science/energy",
"our-science/aquatic-biodiversity-and-biosecurity",
"our-science/aquaculture-and-biotechnology",
"our-science/freshwater",
"our-science/fisheries",
"our-science/oceans",
"our-science/coasts",
"our-science/atmosphere",
"our-science/vessels",
"our-science/te-kuwaha",
"our-science/natural-hazards",
"our-science/pacific-rim",
"our-science",
"our-services/online-services/cam-era",
"our-services/online-services",
"our-services",
"about-niwa",
"common-questions",
"events",
"search"
].each {|it| path(it)}
path("home","home")
path(/^\/$/,"home") #matches "/" but not "/something_after_slash"
@summ = Hash.new
def add(group, count)
if(@summ.key? group) then
@summ[group] = @summ[group] + count.to_i
else
@summ[group] = count.to_i
end
end
File.open(ARGV[0]).each{|ln|
tab = ln.split(/\s/)
if(tab.length > 1) then
$matched = false
@keys.find {|pattern|
#pattern = Regexp.escape(pattern)
group = @patterns[pattern]
#puts "#{pattern} => #{group}"
if(tab[0] =~ pattern) then
add(group, tab[1])
$matched = true
end
$matched
}
if(!$matched) then
#puts "#{tab[0]},#{tab[1]}"
add("other", tab[1])
end
end
}
groups = @keys.collect{|k| @patterns[k]}.uniq
groups.each {|group|
if(group != "DISCARD") then
count = @summ[group]
g = group.sub(/\\\-/,'-')
puts "#{g},#{count}"
end
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment