Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Bash script to parse Apache log for a count of RSS subscribers and email it to you
#!/bin/bash
# --- Required variables ---
RSS_URI="/rss"
MAIL_TO="your@email.com"
LOG_FILE="/var/log/httpd/access_log"
LOG_DATE_FORMAT="%d/%b/%Y"
# --- Optional customization ---
MAIL_SUBJECT="RSS feed subscribers"
# Date expression for yesterday
DATE="-1 day"
# Locale for printf number formatting (e.g. "10000" => "10,000")
LANG=en_US
# Date format for display in emails
HUMAN_FDATE=`date -d "$DATE" '+%F'`
# --- The actual log parsing ---
LOG_FDATE=`date -d "$DATE" "+${LOG_DATE_FORMAT}"`
DAY_BEFORE_FDATE=`date -d "$DATE -1 day" "+${LOG_DATE_FORMAT}"`
# Unique IPs requesting RSS, except those reporting "subscribers":
IPSUBS=`fgrep "$LOG_FDATE" "$LOG_FILE" | fgrep " $RSS_URI" | egrep -v '[0-9]+ subscribers' | cut -d' ' -f 1 | sort | uniq | wc -l`
# Google Reader subscribers and other user-agents reporting "subscribers" and using the "feed-id" parameter for uniqueness:
GRSUBS=`egrep "($LOG_FDATE|$DAY_BEFORE_FDATE)" "$LOG_FILE" | fgrep " $RSS_URI" | egrep -o '[0-9]+ subscribers; feed-id=[0-9]+' | sort -t= -k2 -s | tac | uniq -f2 | awk '{s+=$1} END {print s}'`
# Other user-agents reporting "subscribers", for which we'll use the entire user-agent string for uniqueness:
OTHERSUBS=`fgrep "$LOG_FDATE" "$LOG_FILE" | fgrep " $RSS_URI" | fgrep -v 'subscribers; feed-id=' | egrep '[0-9]+ subscribers' | egrep -o '"[^"]+"$' | sort -t\( -k2 -sr | awk '!x[$1]++' | egrep -o '[0-9]+ subscribers' | awk '{s+=$1} END {print s}'`
REPORT=$(
printf "Feed stats for $HUMAN_FDATE:\n\n"
printf "%'8d Google Reader subscribers\n" $GRSUBS
printf "%'8d subscribers from other aggregators\n" $OTHERSUBS
printf "%'8d direct subscribers\n" $IPSUBS
echo "--------"
printf "%'8d total subscribers\n" `expr $GRSUBS + $OTHERSUBS + $IPSUBS`
)
echo "$REPORT"
echo ""
echo "Also emailed to $MAIL_TO."
echo "$REPORT " | mail -s "[$HUMAN_FDATE] $MAIL_SUBJECT" $MAIL_TO
@MarcoSero

This comment has been minimized.

Copy link

MarcoSero commented Sep 25, 2012

That's is really useful. Do you know if something like that already exists for nginx?

@ManxStef

This comment has been minimized.

Copy link

ManxStef commented Sep 25, 2012

Thanks, Marco! That's super handy.

@MarcoSero This works fine with nginx as-is, assuming you've not tinkered with your log format. (Nginx defaults to "combined", which is the same format as Apache's standard log format, as far as I'm aware.) I've just tested it on my server (running nginx+php-fpm) and it works perfectly.

Oh, one thing for people to note: if you run it in cron after logrotate has rotated your logs, make sure you point it at the .1 previous log (e.g. /var/log/nginx/access.log.1) otherwise you'll get an empty result.

@zanshin

This comment has been minimized.

Copy link

zanshin commented Sep 25, 2012

Outstanding script. It's up and running on my WebFaction hosted site.

@grimreaper

This comment has been minimized.

Copy link

grimreaper commented Sep 25, 2012

script uses /bin/bash instead of /usr/bin/env bash
this is buggy

it also has some gnuisms, which I won't comment on.

@ghost

This comment has been minimized.

Copy link

ghost commented Sep 25, 2012

Great script! Do all feed reader publish that data?

@sprugman

This comment has been minimized.

Copy link

sprugman commented Sep 26, 2012

I use google reader to sync between devices, but never use it's web UI. Don't know if that comes through in your data.

@jamesnvc

This comment has been minimized.

Copy link

jamesnvc commented Sep 26, 2012

Why sort | uniq instead of sort -u?

Also, isn't the cut in cut -d' ' -f 1 | awk '{s+=$1}'… redundant, since AWK is extract the first whitespace-delimited field already?

@picajoso

This comment has been minimized.

Copy link

picajoso commented Oct 1, 2012

I'm affraid this doesn't work on my Nginx/PHP5-FPM installation, it gives me a 0 result on all fields, and I know for sure I've got hundreds of RSS followers... I've tried the global nginx access.log (no access_log here) and also the site access.log.

Any ideas?

@lhagan

This comment has been minimized.

Copy link

lhagan commented Oct 13, 2012

Works fine with my lighttpd logs. This is very Linux-centric though; I had to make a number of minor edits to get it running on NetBSD and OS X: https://gist.github.com/3885597

@Sneagan

This comment has been minimized.

Copy link

Sneagan commented Mar 24, 2013

Had the same issue as @picajoso. I'm no pro when it comes to bash scripts, but I think I got it working.

Change all instances of: fgrep "$LOG_FDATE" "$LOG_FILE" | fgrep " $RSS_URI" |
to: fgrep " $RSS_URI" $LOG_FILE |

If I'm right, the problem is that @marcoarment's log structure includes date and file extension in the actual log entry. This was not the case for mine. What I replaced it with searches for the $RSS_URI in the log file at $LOG_FILE. The numbers I'm getting look correct, but if I'm making a mistake please let me know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.