Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
HFSC - linux traffic shaping's best kept secret
#!/bin/bash
# As the "bufferbloat" folks have recently re-discovered and/or more widely
# publicized, congestion avoidance algorithms (such as those found in TCP) do
# a great job of allowing network endpoints to negotiate transfer rates that
# maximize a link's bandwidth usage without unduly penalizing any particular
# stream. This allows bulk transfer streams to use the maximum available
# bandwidth without affecting the latency of non-bulk (e.g. interactive)
# streams.
# In other words, TCP lets you have your cake and eat it too -- both fast
# downloads and low latency all at the same time.
# However, this only works if TCP's afore-mentioned congestion avoidance
# algorithms actually kick in. The most reliable method of signaling
# congestion is to drop packets. (There are other ways, such as ECN, but
# unfortunately they're still not in wide use.)
# Dropping packets to make the network work better is kinda counter-intuitive.
# But, that's how TCP works. And if you take advantage of that, you can make
# TCP work great.
# Dropping packets gets TCP's attention and fast. The sending endpoint
# throttles back to avoid further network congestion. In other words, your
# fast download slows down. Then, as long as there's no further congestion,
# the sending endpoint gradually increases the transfer rate. Then the cycle
# repeats. It can get a lot more complex than that simple explanation, but the
# main point is: dropping packets when there's congestion is good.
# Traffic shaping is all about slowing down and/or dropping (or ECN marking)
# packets. The thing is, it's much better for latency to simply drop packets
# than it is to slow them down. Linux has a couple of traffic shapers that
# aren't afraid to drop packets. One of the most well-known is TBF, the Token
# Bucket Filter. Normally it slows down packets to a specific rate. But, it
# also accepts a "limit" option to specify the maximum number of packets to
# queue. When the limit is exceeded, packets are dropped.
# TBF's simple "tail-drop" algorithm is actually one of the worst kinds of
# "active queue management" (AQM) that you can do. But even still, it can make
# a huge difference. Applying TBF alone (with a short enough limit) can make a
# maddeningly high-latency link usable again in short order.
# TBF's big disadvantage is that it's a "classless" shaper. That means you
# can't prioritize one TCP stream over another. That's where HTB, the
# Hierarchical Token Bucket, comes in. HTB uses the same general algorithm as
# TBF while also allowing you to filter specific traffic to prioritized queues.
# But HTB has a big weakness: it doesn't have a good, easy way of specifying a
# queue limit like TBF does. That means, compared to TBF, HTB is much more
# inclined to slow packets rather than to drop them. That hurts latency, bad.
# So now we come to Linux traffic shaping's best kept secret: the HFSC shaper.
# HFSC stands for Hierarchical Fair Service Curve. The linux implementation is
# a complex beast, enough so to have a 9 part question about it on serverfault
# ( http://serverfault.com/questions/105014/does-anyone-really-understand-how-hfsc-scheduling-in-linux-bsd-works ).
# Nonetheless, HFSC can be understood in a simplified way as HTB with limits.
# HFSC allows you to classify traffic (like HTB, unlike TBF), but it also has
# no fear of dropping packets (unlike HTB, like TBF).
# HFSC does a great job of keeping latency low. With it, it's possible to fully
# saturate a link while maintaining perfect non-bulk session interactivity.
# It is the holy grail of traffic shaping, and it's in the stock kernel.
# To get the best results, HFSC should be combined with SFQ (Stochastic
# Fairness Queueing) and optionally an ingress filter. If all three are used,
# it's possible to maintain low-latency interactive sessions even without any
# traffic prioritization. Further adding prioritization then maximizes
# interactivity.
# Here's how it's done:
# set this to your internet-facing network interface:
WAN_INTERFACE=eth0
# set this to your local network interface:
LAN_INTERFACE=eth1
# how fast is your downlink?
MAX_DOWNRATE=3072kbit
# how close should we get to max down? e.g. 90%
USE_DOWNPERCENT=0.90
# how fast is your uplink?
MAX_UPRATE=384kbit
# how close should we get to max up? e.g. 80%
USE_UPPERCENT=0.80
# what port do you want to prioritize? e.g. for ssh, use 22
INTERACTIVE_PORT=22
## now for the magic
# remove any existing qdiscs
/sbin/tc qdisc del dev $WAN_INTERFACE root 2> /dev/null
/sbin/tc qdisc del dev $WAN_INTERFACE ingress 2> /dev/null
/sbin/tc qdisc del dev $LAN_INTERFACE root 2> /dev/null
/sbin/tc qdisc del dev $LAN_INTERFACE ingress 2> /dev/null
# computations
MAX_UPNUM=`echo $MAX_UPRATE | sed 's/[^0-9]//g'`
MAX_UPBASE=`echo $MAX_UPRATE | sed 's/[0-9]//g'`
MAX_DOWNNUM=`echo $MAX_DOWNRATE | sed 's/[^0-9]//g'`
MAX_DOWNBASE=`echo $MAX_DOWNRATE | sed 's/[0-9]//g'`
NEAR_MAX_UPNUM=`echo "$MAX_UPNUM * $USE_UPPERCENT" | bc | xargs printf "%.0f"`
NEAR_MAX_UPRATE="${NEAR_MAX_UPNUM}${MAX_UPBASE}"
NEAR_MAX_DOWNNUM=`echo "$MAX_DOWNNUM * $USE_DOWNPERCENT" | bc | xargs printf "%.0f"`
NEAR_MAX_DOWNRATE="${NEAR_MAX_DOWNNUM}${MAX_DOWNBASE}"
HALF_MAXUPNUM=$(( $MAX_UPNUM / 2 ))
HALF_MAXUP="${HALF_MAXUPNUM}${MAX_UPBASE}"
HALF_MAXDOWNNUM=$(( $MAX_DOWNNUM / 2 ))
HALF_MAXDOWN="${HALF_MAXDOWNNUM}${MAX_DOWNBASE}"
# install HFSC under WAN to limit upload
/sbin/tc qdisc add dev $WAN_INTERFACE root handle 1: hfsc default 11
/sbin/tc class add dev $WAN_INTERFACE parent 1: classid 1:1 hfsc sc rate $NEAR_MAX_UPRATE ul rate $NEAR_MAX_UPRATE
/sbin/tc class add dev $WAN_INTERFACE parent 1:1 classid 1:10 hfsc sc umax 1540 dmax 5ms rate $HALF_MAXUP ul rate $NEAR_MAX_UPRATE
/sbin/tc class add dev $WAN_INTERFACE parent 1:1 classid 1:11 hfsc sc umax 1540 dmax 5ms rate $HALF_MAXUP ul rate $HALF_MAXUP
# prioritize interactive ports
/sbin/tc filter add dev $WAN_INTERFACE protocol ip parent 1:0 prio 1 u32 match ip sport $INTERACTIVE_PORT 0xffff flowid 1:10
/sbin/tc filter add dev $WAN_INTERFACE protocol ip parent 1:0 prio 1 u32 match ip dport $INTERACTIVE_PORT 0xffff flowid 1:10
# add SFQ
/sbin/tc qdisc add dev $WAN_INTERFACE parent 1:10 handle 30: sfq perturb 10
/sbin/tc qdisc add dev $WAN_INTERFACE parent 1:11 handle 40: sfq perturb 10
# install ingress filter to limit download to 97% max
MAX_DOWNRATE_INGRESSNUM=`echo "$MAX_DOWNNUM * 0.97" | bc | xargs printf "%.0f"`
MAX_DOWNRATE_INGRESS="${MAX_DOWNRATE_INGRESSNUM}${MAX_DOWNBASE}"
/sbin/tc qdisc add dev $WAN_INTERFACE handle ffff: ingress
/sbin/tc filter add dev $WAN_INTERFACE parent ffff: protocol ip prio 1 u32 match ip sport $INTERACTIVE_PORT 0xffff flowid :1
/sbin/tc filter add dev $WAN_INTERFACE parent ffff: protocol ip prio 1 u32 match ip dport $INTERACTIVE_PORT 0xffff flowid :1
/sbin/tc filter add dev $WAN_INTERFACE parent ffff: protocol ip prio 50 u32 match ip src 0.0.0.0/0 police rate $MAX_DOWNRATE_INGRESS burst 20k drop flowid :2
# install HFSC under LAN to limit download
/sbin/tc qdisc add dev $LAN_INTERFACE root handle 1: hfsc default 11
/sbin/tc class add dev $LAN_INTERFACE parent 1: classid 1:1 hfsc sc rate 1000mbit ul rate 1000mbit
/sbin/tc class add dev $LAN_INTERFACE parent 1:1 classid 1:10 hfsc sc umax 1540 dmax 5ms rate 900mbit ul rate 900mbit
/sbin/tc class add dev $LAN_INTERFACE parent 1:1 classid 1:11 hfsc sc umax 1540 dmax 5ms rate $HALF_MAXDOWN ul rate $NEAR_MAX_DOWNRATE
# prioritize interactive ports
/sbin/tc filter add dev $LAN_INTERFACE protocol ip parent 1:0 prio 1 u32 match ip sport $INTERACTIVE_PORT 0xffff flowid 1:10
/sbin/tc filter add dev $LAN_INTERFACE protocol ip parent 1:0 prio 1 u32 match ip dport $INTERACTIVE_PORT 0xffff flowid 1:10
# add SFQ
/sbin/tc qdisc add dev $LAN_INTERFACE parent 1:10 handle 30: sfq perturb 10
/sbin/tc qdisc add dev $LAN_INTERFACE parent 1:11 handle 40: sfq perturb 10

dtaht commented Oct 21, 2013

A great deal of hfsc's supposed benefit comes from the sfq lines at the end.

And if you substitute fq_codel for sfq, it will work better, particularly at higher bandwidths.

Why do you give the non-interactive stream an upper limit of only half the bandwidth, and not the near_max rate? You're permanently reserving half your bandwidth, but doesn't the RL curve of hfsc already guarantee that the interactive session will get its minimum bandwidth when it needs it and within the specified delay?

eqhmcow commented Apr 4, 2014

@jdeblese - the name of the game is stopping bufferbloat, and the upload buffers are usually the most sensitive to bloat and can really hurt latency when they start to fill. So, we don't want any stream or streams to start pushing anywhere near the max, because they will step over before we can start dropping, and in doing so they will fill buffers and kill interactivity.

We don't have a hard limit on upload like we do with the ingress filter on download, so we need to tell HFSC that we absolutely don't want streams anywhere near max up, so it has time to recognize streams that are trying to get there and squelch them.

That said, if you find different numbers work better for you, go for it! Try it and see :)

Also, a newer version of this script is at https://gist.github.com/eqhmcow/939373 . It has some better integration with iptables and prioritizes ACKs and other small traffic.

eqhmcow commented Apr 4, 2014

@dtaht - You're right that SFQ does wonderful things. And, I don't claim to understand HFSC except in the most basic way. However, I have observed (with watch -d -- '/sbin/tc -s qdisc' ) that HFSC does indeed drop packets; and dropping packets is the goal!

fq_codel may indeed be better; I haven't tried it.

Finally -- if you're having to use AQM and you're not doing core internet routing, the best answer is always "buy more bandwidth" ! If you can eliminate the buffer by eliminating bottlenecks (rather than enforcing them, as this script does), you eliminate bufferbloat :)

is the only way to implement this in download context system wide? Ie. can the ingress filtering be applied on a user by user level rather than just limiting all user traffic?

eqhmcow commented Aug 28, 2015

You can use iptables to mark packets for / not for particular users and then drop them into different tc classids

Thanks nice start for own scripts. :)

eqhmcow commented Dec 27, 2016

merry Christmas! yes, I'm still using this script and yes, it still works great. (I'm the original author, but this fork ranks higher in google for some reason)

I've updated the script to match my latest port prioritizations, but the core is still the same and still kicks latency's butt.

see https://gist.github.com/eqhmcow/939373 for the latest version

moscato commented Jun 9, 2017

"Finally -- if you're having to use AQM and you're not doing core internet routing, the best answer is always "buy more bandwidth" ! If you can eliminate the buffer by eliminating bottlenecks (rather than enforcing them, as this script does), you eliminate bufferbloat :)"

That's not really true.

There will always be a bigger fish.

You can have bufferbloat on gigabit symmetrical connections.

An example: Steam

Steam will consume any amount of bandwidth you throw at it, for the duration of a game download, which can exceed 50 gigs at times.

It'll throw out 20+ connections that will consume anything you throw at it.

Fq_Codel will limit the queue length intelligently allowing new traffic to go through without pain.

Currently: I have htb with one up queue, and one down queue, with fq_codel, and I don't need any form of prioritization, as it sanely handles my bandwidth and buffers.

You mentioned hfsc drops packets when congestion happens, and htb doesn't

fq_codel, which can work with htb, drops packets when congestion happens, so that benefit in favor of hfsc is lost

eqhmcow commented Jun 17, 2017

@moscato - "lost" is a strong word

HFSC rules let you specify what you think your max upload or download is, rather than trust fq_codel to find them for you.

In my anecdotal testing, I cannot play low-latency games with only fq_codel prioritizing the link; I can easily game with my HFSC script turned on. To be fair, I haven't tried htb + fq_codel.

To be honest I do find it rather odd the number of people who advocate for fq_codel; I imagine a no-knobs solution to be less optimal than one where you set up your limits up front, and then enforce them. But, to each their own; I have no faith in fq_codel, but I'm perfectly happy if other people have found it useful.

@eqhmcow - I think you confuse work conserving and work non-conserving schedulers purpose.

fq_codel is work conserving scheduler. It will not work right if "narrow" point will be somewhere it is not directly attached to. And even if it is so, it will not prioritize, but divide bandwidth proportionally to flows and maintain low latency in flows. Of course it will not allow you to play games while you saturate line with downloads (because of packet dropping in buffer of flow of given game). So easy solution is to attach it to work non-conservig scheduler (HFSC or HTB) leaves. And prioritize traffic in several classes of this work non-conserving scheduler (as you do).

fq_codel is not ideal (and to explain why would take much more time I am willing to sacrifice here), but it is currently the best Linux kernel offer. The main selling point of fq_codel is not "no knobs solution", but actually proportional dropping of packets in flows based on increasing delay of packets in given flow over some limit (5ms) combined with setting ECN bits (extremly helpful, can be disabled).

And notice to SFQ: NEVER use SFQ with perturb! Or better never use SFQ at all when much better fq_codel exists. When changing hashing function occures in regular "perturb" intervals, there is a chance that packets in flows get reordered, which is BIG NO NO.

@eqhmcow @bradoaks Thanks a bunch for these scripts! They pointed me in the direction of HFSC and fq_codel, which are indeed great schedulers.

I updated my SuperShaper-SOHO solution to use HFSC and fq_codel based on your scripts above and some additional sources. I even wrote a blog post about this transition from HTB/SFQ to HFSC/fq_codel.

One of the benefits of my solution based on your solution is that mine only uses the ul service curve on the root class, allowing any flow to borrow from each other when the link is not fully saturated. I also don't use the rt service curve, because I couldn't understand the math involved. But just using link-sharing (ls) is still, in my opinion, much better than using HTB and SFQ. My latency is now controlled much better. Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment