Skip to content

Instantly share code, notes, and snippets.

@srikanthlogic
Created July 20, 2012 22:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save srikanthlogic/3153627 to your computer and use it in GitHub Desktop.
Save srikanthlogic/3153627 to your computer and use it in GitHub Desktop.
Script to Downlaod Wikimedia squid logs and parse ta logs alone for a month.
#!/bin/bash
# Script to Downlaod Wikimedia squid logs and parse ta logs alone for a month.
# TODO : Get pageview stats on article space alone.
# (C) Srikanth Logic - srik.lak@gmail.com - GPLv2 or more
SERVER="http://dumps.wikimedia.org/other/pagecounts-raw/2012/2012-06/"
for day in `seq -f'%02g' 1 30`
do
for i in `seq -f'%02g' 0 23`
do
FILENAME="pagecounts-201206$day-$i""0000"
echo $SERVER$FILENAME
wget $SERVER$FILENAME.gz 2>/dev/null
export RC=$?
if [ "$RC" = "0" ]; then
echo $1 OK
else
echo $1 NOTOK
FILENAME="pagecounts-201206$day-$i""0001"
wget $SERVER$FILENAME.gz
fi
gunzip *.gz
more $FILENAME | pcregrep -M '\nta ' >> tamilstats201206
rm pagecount*
echo "Completed $day$i"
done
echo "Completed $day"
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment