to be added later.
Two commits, because I had to experiment to find what worked. https://github.com/berkmancenter/lumendatabase/commit/0e414fcba54851a615dee73f5c36624063e6d4f7 https://github.com/berkmancenter/lumendatabase/commit/2877ea1c05850715405340a7c8680056653a4876
These scripts have banned over 120,000 IPs across their lifetime, though most are not presently banned.
I've removed a couple of values specific to our configuration and indicated where you should replace them with values that make sense in your context.
nginx-lumen-abuse
catches IPs that violate a throttling policy; recidive-lumen
catches repeat offenders. lumen
is the name of our app so you probably want to
replace that too :).
nginx-lumen-abuse
enabled = true
port = http,https
filter = nginx-lumen-abuse
logpath = $YOUR_LOG_PATH_HERE
bantime = $BAN_LENGTH_IN_SECONDS
# A host is banned if it has generated "maxretry" during the last "findtime" seconds.
findtime = $INTERVAL_IN_SECONDS
maxretry = $NUMBER_THAT_MAKES_SENSE_IN_YOUR_CONTEXT
recidive-lumen
enabled = true
filter = recidive-lumen
logpath = $YOUR_LOG_PATH_HERE
action = iptables-multiport[name=recidive-lumen,port="http,https"]
bantime = $BAN_LENGTH_IN_SECONDS
findtime = $INTERVAL_IN_SECONDS
maxretry = $NUMBER_THAT_MAKES_SENSE_IN_YOUR_CONTEXT
/etc/fail2ban/filter.d/nginx-lumen-abuse.conf
# Fail2Ban filter to match web requests for selected URLs
#
[Definition]
failregex = ^(www\.)?lumendatabase.org <HOST> \- \S+ \[ \S+\] \"GET $YOUR_HOT_PATH_HERE \S+ \- \-\" (200|429) .+$
^(www\.)?lumendatabase.org <HOST> \- \S+ \[ \S+\] \"GET $YOUR_OTHER_HOT_PATH_HERE \S+ \- \-\" (200|429) .+$
ignoreregex =
# DEV Notes:
# Based on apache-botsearch filter
#
# Author: Frantisek Sumsal
/etc/fail2ban/filter.d/recidive-lumen.conf
# Fail2Ban filter for repeat bans
#
# This filter monitors the fail2ban log file, and enables you to add long
# time bans for ip addresses that get banned by fail2ban multiple times.
#
# Reasons to use this: block very persistent attackers for a longer time,
# stop receiving email notifications about the same attacker over and
# over again.
#
# This jail is only useful if you set the 'findtime' and 'bantime' parameters
# in jail.conf to a higher value than the other jails. Also, this jail has its
# drawbacks, namely in that it works only with iptables, or if you use a
# different blocking mechanism for this jail versus others (e.g. hostsdeny
# for most jails, and shorewall for this one).
[INCLUDES]
# Read common prefixes. If any customizations available -- read them from
# common.local
before = common.conf
[Definition]
_daemon = fail2ban\.actions
# The name of the jail that this filter is used for. In jail.conf, name the
# jail using this filter 'recidive', or change this line!
_jailname = recidive-lumen
# example log line
#2019-04-03 13:49:51,332 fail2ban.actions: WARNING [nginx-lumen-abuse] Ban $EXAMPLE_DODGY_IP
failregex = ^(%(__prefix_line)s|,\d{3} fail2ban.actions:\s+)WARNING\s+\[(?!%(_jailname)s\])(?:.*)\]\s+Ban\s+<HOST>\s*$
ignoreregex =
# Author: Tom Hendrikx, modifications by Amir Caspi
Here are the awk scripts I referenced. the numbers (
$2
and similar) might have to be changed depending on the format of this log; they should reference the column containing the data of interest (so e.g. the second position in my log lines is where IP addresses go).I used these as ways to "read" my logs quickly, looking for data to inform future judgments (e.g. are there IP ranges I should ban outright; which paths on our site need to be throttled).
Find the IPs that hit you most often:
awk '{ print $2}' $PATH_TO_YOUR_LOGFILE | sort | uniq -c | sort -nr | head -n 20
Count hits per IP over a time range:
grep "01/Apr/2019:12:10" $PATH_TO_YOUR_LOGFILE |awk '{print $2}' |sort |uniq -c |sort -n
Modify the grep for your date/time. Note that you can look at a single second, or a ten-minute range, or an hour, et cetera, by changing how much of the time you write out.
Find 500 errors:
less $PATH_TO_YOUR_LOGFILE | grep " 500" | awk '{print $2}'
This prints IP addresses which get 500 errors. The quotes and the space in the grep are important -- otherwise you'll find IP addresses and site URLs which contain the substring "500".
Find the pages on your site that are successfully fetched (i.e. http 200) most often:
less $PATH_TO_YOUR_LOGFILE | grep " 200" | awk '{print $8}' |sort |uniq -c |sort -n
(...because if I were an attacker I'd find the slow pages and hit them repeatedly.)
Find most common 429 errors (by IP and path)
grep $PATH_TO_YOUR_LOGFILE | awk '{print $2, $8}' | sort |uniq -c | sort -n
Useful if you're returning 429s as part of a throttling policy -- find the IPs that get throttled a lot.