Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
HFSC - linux traffic shaping's best kept secret
#!/bin/bash
# As the "bufferbloat" folks have recently re-discovered and/or more widely
# publicized, congestion avoidance algorithms (such as those found in TCP) do
# a great job of allowing network endpoints to negotiate transfer rates that
# maximize a link's bandwidth usage without unduly penalizing any particular
# stream. This allows bulk transfer streams to use the maximum available
# bandwidth without affecting the latency of non-bulk (e.g. interactive)
# streams.
# In other words, TCP lets you have your cake and eat it too -- both fast
# downloads and low latency all at the same time.
# However, this only works if TCP's afore-mentioned congestion avoidance
# algorithms actually kick in. The most reliable method of signaling
# congestion is to drop packets. (There are other ways, such as ECN, but
# unfortunately they're still not in wide use.)
# Dropping packets to make the network work better is kinda counter-intuitive.
# But, that's how TCP works. And if you take advantage of that, you can make
# TCP work great.
# Dropping packets gets TCP's attention and fast. The sending endpoint
# throttles back to avoid further network congestion. In other words, your
# fast download slows down. Then, as long as there's no further congestion,
# the sending endpoint gradually increases the transfer rate. Then the cycle
# repeats. It can get a lot more complex than that simple explanation, but the
# main point is: dropping packets when there's congestion is good.
# Traffic shaping is all about slowing down and/or dropping (or ECN marking)
# packets. The thing is, it's much better for latency to simply drop packets
# than it is to slow them down. Linux has a couple of traffic shapers that
# aren't afraid to drop packets. One of the most well-known is TBF, the Token
# Bucket Filter. Normally it slows down packets to a specific rate. But, it
# also accepts a "limit" option to specify the maximum number of packets to
# queue. When the limit is exceeded, packets are dropped.
# TBF's simple "tail-drop" algorithm is actually one of the worst kinds of
# "active queue management" (AQM) that you can do. But even still, it can make
# a huge difference. Applying TBF alone (with a short enough limit) can make a
# maddeningly high-latency link usable again in short order.
# TBF's big disadvantage is that it's a "classless" shaper. That means you
# can't prioritize one TCP stream over another. That's where HTB, the
# Hierarchical Token Bucket, comes in. HTB uses the same general algorithm as
# TBF while also allowing you to filter specific traffic to prioritized queues.
# But HTB has a big weakness: it doesn't have a good, easy way of specifying a
# queue limit like TBF does. That means, compared to TBF, HTB is much more
# inclined to slow packets rather than to drop them. That hurts latency, bad.
# So now we come to Linux traffic shaping's best kept secret: the HFSC shaper.
# HFSC stands for Hierarchical Fair Service Curve. The linux implementation is
# a complex beast, enough so to have a 9 part question about it on serverfault
# ( http://serverfault.com/questions/105014/does-anyone-really-understand-how-hfsc-scheduling-in-linux-bsd-works ).
# Nonetheless, HFSC can be understood in a simplified way as HTB with limits.
# HFSC allows you to classify traffic (like HTB, unlike TBF), but it also has
# no fear of dropping packets (unlike HTB, like TBF).
# HFSC does a great job of keeping latency low. With it, it's possible to fully
# saturate a link while maintaining perfect non-bulk session interactivity.
# It is the holy grail of traffic shaping, and it's in the stock kernel.
# To get the best results, HFSC should be combined with SFQ (Stochastic
# Fairness Queueing) and optionally an ingress filter. If all three are used,
# it's possible to maintain low-latency interactive sessions even without any
# traffic prioritization. Further adding prioritization then maximizes
# interactivity.
# Here's how it's done:
# set this to your internet-facing network interface:
WAN_INTERFACE=eth1
# set this to your local network interface:
LAN_INTERFACE=eth0
LAN_NETWORK=192.168.1.0/24
# how fast is your downlink?
MAX_DOWNRATE=6144kbit
# how close should we get to max down? e.g. 95%
USE_DOWNPERCENT=0.95
# how fast is your uplink?
MAX_UPRATE=384kbit
# how close should we get to max up? e.g. 90%
USE_UPPERCENT=0.90
# what port do you want to prioritize? e.g. for ssh, use 22
# 3074 for xbox
INTERACTIVE_PORT=3074
# NOTE: port 22 is prioritized as well, see below for the full list of
# prioritized ports. (this script has been updated to use some loops to make
# adding / removing ports easier, and I've added a number of xbox-related
# ports.)
## now for the magic
# set this to the path to your tc binary
#TC=/usr/sbin/tc
TC=/sbin/tc
# remove any existing qdiscs
# this lets you re-run this script as much as you want to try different
# settings, we're always starting fresh
$TC qdisc del dev $WAN_INTERFACE root 2> /dev/null
$TC qdisc del dev $WAN_INTERFACE ingress 2> /dev/null
$TC qdisc del dev $LAN_INTERFACE root 2> /dev/null
$TC qdisc del dev $LAN_INTERFACE ingress 2> /dev/null
# computations
MAX_UPNUM=`echo $MAX_UPRATE | sed 's/[^0-9]//g'`
MAX_UPBASE=`echo $MAX_UPRATE | sed 's/[0-9]//g'`
MAX_DOWNNUM=`echo $MAX_DOWNRATE | sed 's/[^0-9]//g'`
MAX_DOWNBASE=`echo $MAX_DOWNRATE | sed 's/[0-9]//g'`
NEAR_MAX_UPNUM=`echo "$MAX_UPNUM * $USE_UPPERCENT" | bc | xargs printf "%.0f"`
NEAR_MAX_UPRATE="${NEAR_MAX_UPNUM}${MAX_UPBASE}"
NEAR_MAX_DOWNNUM=`echo "$MAX_DOWNNUM * $USE_DOWNPERCENT" | bc | xargs printf "%.0f"`
NEAR_MAX_DOWNRATE="${NEAR_MAX_DOWNNUM}${MAX_DOWNBASE}"
HALF_MAXUPNUM=$(( $MAX_UPNUM / 2 ))
HALF_MAXUP="${HALF_MAXUPNUM}${MAX_UPBASE}"
HALF_MAXDOWNNUM=$(( $MAX_DOWNNUM / 2 ))
HALF_MAXDOWN="${HALF_MAXDOWNNUM}${MAX_DOWNBASE}"
## mark small packets - i.e. ACKs
## (run this once and/or add it to your regular IP tables rules) -- this iptables command
# only needs to run once per reboot, whereas the rest of this script can be run as often
# as you like to try different settings (the qdiscs are cleared / reset at the top of the script)
#/usr/sbin/iptables -t mangle -A PREROUTING -p tcp -m length --length 0:128 -j MARK --set-mark 2
# install HFSC under WAN to limit upload
$TC qdisc add dev $WAN_INTERFACE root handle 1: hfsc default 11
$TC class add dev $WAN_INTERFACE parent 1: classid 1:1 hfsc sc rate $NEAR_MAX_UPRATE ul rate $NEAR_MAX_UPRATE
$TC class add dev $WAN_INTERFACE parent 1:1 classid 1:10 hfsc sc umax 1540 dmax 5ms rate $HALF_MAXUP ul rate $NEAR_MAX_UPRATE
$TC class add dev $WAN_INTERFACE parent 1:1 classid 1:11 hfsc sc umax 1540 dmax 5ms rate $HALF_MAXUP ul rate $HALF_MAXUP
#$TC class add dev $WAN_INTERFACE parent 1:1 classid 1:11 hfsc sc umax 1540 dmax 5ms rate $HALF_MAXUP ul rate $NEAR_MAX_UPRATE
# install ingress filter to limit download to 97% max
MAX_DOWNRATE_INGRESSNUM=`echo "$MAX_DOWNNUM * 0.97" | bc | xargs printf "%.0f"`
MAX_DOWNRATE_INGRESS="${MAX_DOWNRATE_INGRESSNUM}${MAX_DOWNBASE}"
$TC qdisc add dev $WAN_INTERFACE handle ffff: ingress
# NOTE: we route all prioritized traffic through the ingress filter with NO policing;
# that is, we do not police-limit prioritized traffic.
# prioritize interactive port blocks: xbox gaming
for i in $INTERACTIVE_PORT 30000 31000 32000 33000 34000 35000 36000 37000 38000 39000 40000 41000; do
# recent xbox games enjoy using high ports -- the same port numbers which
# client OSes occasionally also use for source ports for bulk traffic (e.g.
# http for streaming) -- so, don't prioritize high source ports, only high
# dest ports (that is, only prioritize streams connecting to ports the gaming
# servers are listening on)
# $TC filter add dev $WAN_INTERFACE protocol ip parent 1:0 prio 1 u32 match ip sport $i 0xfc00 flowid 1:10
$TC filter add dev $WAN_INTERFACE protocol ip parent 1:0 prio 1 u32 match ip dport $i 0xfc00 flowid 1:10
# policer:
# $TC filter add dev $WAN_INTERFACE parent ffff: protocol ip prio 1 u32 match ip sport $i 0xfc00 flowid :1
$TC filter add dev $WAN_INTERFACE parent ffff: protocol ip prio 1 u32 match ip dport $i 0xfc00 flowid :1
done
# do prioritize low xbox source port, some games still use this port range for a source port
# (in that case it doesn't matter which port they listen on; the client connects
# from the low source port to any server-side port)
$TC filter add dev $WAN_INTERFACE protocol ip parent 1:0 prio 1 u32 match ip sport $INTERACTIVE_PORT 0xfc00 flowid 1:10
# policer:
$TC filter add dev $WAN_INTERFACE parent ffff: protocol ip prio 1 u32 match ip sport $INTERACTIVE_PORT 0xfc00 flowid :1
# prioritize additional single ports: ssh, VPN, webex, google hangouts
# audio / video. VPN port (4500) may also be used for AT&T / t-mobile phone
# calls over wifi, and Apple Facetime uses ports around 3xxx, which are already
# prioritized for xbox. in fact it's probably safe to prioritize any low
# source AND dest ports (lower than 20,000) other than 80 and 443 (but we
# don't do that here)
for i in 22 4500 9000 19302 19303 19304 19305 19306 19307 19308 19309; do
$TC filter add dev $WAN_INTERFACE protocol ip parent 1:0 prio 1 u32 match ip sport $i 0xffff flowid 1:10
$TC filter add dev $WAN_INTERFACE protocol ip parent 1:0 prio 1 u32 match ip dport $i 0xffff flowid 1:10
# policer:
$TC filter add dev $WAN_INTERFACE parent ffff: protocol ip prio 1 u32 match ip sport $i 0xffff flowid :1
$TC filter add dev $WAN_INTERFACE parent ffff: protocol ip prio 1 u32 match ip dport $i 0xffff flowid :1
done
# also VPN IP range - set this if you know your corporate VPN IP range
#$TC filter add dev $WAN_INTERFACE protocol ip parent 1:0 prio 1 u32 match ip src 1.2.3.4/24 flowid 1:10
#$TC filter add dev $WAN_INTERFACE protocol ip parent 1:0 prio 1 u32 match ip dst 1.2.3.4/24 flowid 1:10
# policer:
#$TC filter add dev $WAN_INTERFACE parent ffff: protocol ip prio 1 u32 match ip src 1.2.3.4/24 flowid :1
#$TC filter add dev $WAN_INTERFACE parent ffff: protocol ip prio 1 u32 match ip dst 1.2.3.4/24 flowid :1
# also icmp
$TC filter add dev $WAN_INTERFACE protocol ip parent 1:0 prio 1 u32 match ip protocol 1 0xff flowid 1:10
# policer:
$TC filter add dev $WAN_INTERFACE parent ffff: protocol ip prio 1 u32 match ip protocol 1 0xff flowid :1
# prioritize ack (iptables-marked packets)
$TC filter add dev $WAN_INTERFACE protocol ip parent 1:0 prio 2 handle 2 fw flowid 1:10
# policer:
$TC filter add dev $WAN_INTERFACE parent ffff: protocol ip prio 2 handle 2 fw flowid :1
# add SFQ (or fq_codel if you have at least kernel 3.16 or so)
# (or any anti-bufferbloat qdisc such as cake, if you have an ever newer kernel)
# NOTE:
# I tried running without hfsc, and only using fq_codel -- it doesn't work as well
# as running with both hfsc and SFQ. that is, xbox gaming latency went to hell cuz
# fq_codel didn't know when to start dropping packets. however, once you have hfsc
# in the mix any FQ-like qdisc should work here.
# UPDATE:
# I recommend using SFQ *instead* of fq_codel, at least on slow / 32-bit / embedded
# systems. In my anecdotal testing, fq_codel can slow down streams more than SFQ
# does, and introduce a bit of latency.
$TC qdisc add dev $WAN_INTERFACE parent 1:10 handle 30: sfq perturb 10
$TC qdisc add dev $WAN_INTERFACE parent 1:11 handle 40: sfq perturb 10
#$TC qdisc add dev $WAN_INTERFACE parent 1:10 handle 30: fq_codel
#$TC qdisc add dev $WAN_INTERFACE parent 1:11 handle 40: fq_codel
# enact policer for all other traffic
$TC filter add dev $WAN_INTERFACE parent ffff: protocol ip prio 50 u32 match ip src 0.0.0.0/0 police rate $MAX_DOWNRATE_INGRESS burst 20k drop flowid :2
# install HFSC under LAN to limit download
$TC qdisc add dev $LAN_INTERFACE root handle 1: hfsc default 11
$TC class add dev $LAN_INTERFACE parent 1: classid 1:1 hfsc sc rate 1000mbit ul rate 1000mbit
$TC class add dev $LAN_INTERFACE parent 1:1 classid 1:10 hfsc sc umax 1540 dmax 5ms rate 900mbit ul rate 900mbit
$TC class add dev $LAN_INTERFACE parent 1:1 classid 1:11 hfsc sc umax 1540 dmax 5ms rate $HALF_MAXDOWN ul rate $NEAR_MAX_DOWNRATE
# prioritize local LAN traffic
$TC filter add dev $LAN_INTERFACE protocol ip parent 1:0 prio 1 u32 match ip src $LAN_NETWORK match ip dst $LAN_NETWORK flowid 1:10
# prioritize interactive port blocks: xbox gaming
# see above for why we only prioritize the source port and not the dest port here
# (in this case source port is remote host's port)
for i in $INTERACTIVE_PORT 30000 31000 32000 33000 34000 35000 36000 37000 38000 39000 40000 41000; do
$TC filter add dev $LAN_INTERFACE protocol ip parent 1:0 prio 1 u32 match ip sport $i 0xfc00 flowid 1:10
# $TC filter add dev $LAN_INTERFACE protocol ip parent 1:0 prio 1 u32 match ip dport $i 0xfc00 flowid 1:10
done
$TC filter add dev $LAN_INTERFACE protocol ip parent 1:0 prio 1 u32 match ip dport $INTERACTIVE_PORT 0xfc00 flowid 1:10
# additional single ports: ssh, VPN, webex, google hangouts
for i in 22 4500 9000 19302 19303 19304 19305 19306 19307 19308 19309; do
$TC filter add dev $LAN_INTERFACE protocol ip parent 1:0 prio 1 u32 match ip sport $i 0xffff flowid 1:10
$TC filter add dev $LAN_INTERFACE protocol ip parent 1:0 prio 1 u32 match ip dport $i 0xffff flowid 1:10
done
# also VPN IP range - set this if you know your corporate VPN IP range
#$TC filter add dev $LAN_INTERFACE protocol ip parent 1:0 prio 1 u32 match ip src 1.2.3.4/24 flowid 1:10
#$TC filter add dev $LAN_INTERFACE protocol ip parent 1:0 prio 1 u32 match ip dst 1.2.3.4/24 flowid 1:10
# also icmp
$TC filter add dev $LAN_INTERFACE protocol ip parent 1:0 prio 1 u32 match ip protocol 1 0xff flowid 1:10
## OPTIONAL:
## prioritize any marked packets - use iptables to mark any other prioritized traffic;
## use iptables -t mangle -A PREROUTING -j MARK --set-mark 1 (plus other options)
## NOTE: this is different from ack marking above -- you'd use this with complex iptables rules, e.g. to not
## rate-limit web traffic to/from a specific IP
#$TC filter add dev $LAN_INTERFACE protocol ip parent 1:0 prio 2 handle 1 fw flowid 1:10
# add SFQ / fq_codel / cake / etc (see notes above)
$TC qdisc add dev $LAN_INTERFACE parent 1:10 handle 30: sfq perturb 10
$TC qdisc add dev $LAN_INTERFACE parent 1:11 handle 40: sfq perturb 10
#$TC qdisc add dev $LAN_INTERFACE parent 1:10 handle 30: fq_codel
#$TC qdisc add dev $LAN_INTERFACE parent 1:11 handle 40: fq_codel

dtaht commented Oct 21, 2013

A great deal of hfsc's supposed benefits comes from the sfq qdisc. And you will do quite a bit better if you substitute fq_codel for sfq at higher bandwidths.

But I digress. A huge problem is applying these algos to a system with offloads on, you should make sure ethtool -K your_device tso off gso off gro off
is enabled for all devices for kernels prior to 3.12.

Secondly, I benchmarked this code vs cerowrt's htb + fq_codel simplest.qos implementation, at these bitrates, using a variety of simple tests. htb with fq_codel won across the board, both on single stream throughput with competing traffic, and on interactive traffic in general.

Please bench for yourself, the scripts are at https://github.com/dtaht/ceropackages-3.3/tree/master/net/aqm-scripts/files/usr/lib/aqm

Lastly, Line 152 and 153 also appear wrong to my eye.

Owner

eqhmcow commented Apr 4, 2014

I've primarily tested this on a box sitting in my bedroom that has no fan and a 500 MHz VIA CPU, so server-class NICs with offloading wasn't even a consideration :)

You're right that SFQ does wonderful things. And, I don't claim to understand HFSC except in the most basic way. However, I have observed (with watch -d -- '/sbin/tc -s qdisc' ) that HFSC does indeed drop packets; and dropping packets is the goal!

Lines 150 to 154 are meant to say "don't slow down LAN to LAN traffic, slow down WAN to LAN traffic." I believe those directives function correctly, but please do correct me if I'm wrong.

Finally -- if you're having to use AQM and you're not doing core internet routing, the best answer is always "buy more bandwidth" ! If you can eliminate the buffer by eliminating bottlenecks (rather than enforcing them, as this script does), you eliminate bufferbloat :)

Owner

eqhmcow commented Dec 27, 2016

merry Christmas! yes, I'm still using this script and yes, it still works great.

I've updated it to match my latest port prioritizations, but the core is still the same and still kicks latency's butt.

Owner

eqhmcow commented Dec 31, 2016

New Year's update:

I recommend using SFQ instead of fq_codel, at least on slow / 32-bit / embedded systems. In my anecdotal testing, fq_codel can slow down streams more than SFQ does, and introduce a bit of latency.

moscato commented Jun 3, 2017

Just found this:
I'm actually using htb+fq_codel, and a maxed out steam download will only add 7 milliseconds latency to an ICMP packet.

On pfsense, I've used hfsc with codel, and it performed remarkably worse than htb+fq_codel on linux

Maybe the issue is the slow embedded hardware? I'm testing htb+fq_codel on an ubiquity edgerouter x, which is 880mhz mips64

Owner

eqhmcow commented Jun 17, 2017

@moscato - are you testing only maxed out download? that can cause issues, but bufferbloat on uploads can cause even more latency

the HFSC rules in the script will kill (highly throttle) uploads, which is no fun for whatever is trying to backup to the cloud, but is great for other people trying to use the slower link for browsing or gaming

interduo commented Sep 16, 2017

If You've got many hosts (>1000) there is a problem with to many checking per packet and that causes high CPU usage.
So I suggest using hash tables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment