Skip to content

Instantly share code, notes, and snippets.

@bradoaks
Forked from eqhmcow/hfsc-shape.sh
Created April 25, 2011 14:51
Show Gist options
  • Save bradoaks/940616 to your computer and use it in GitHub Desktop.
Save bradoaks/940616 to your computer and use it in GitHub Desktop.
HFSC - linux traffic shaping's best kept secret
#!/bin/bash
# As the "bufferbloat" folks have recently re-discovered and/or more widely
# publicized, congestion avoidance algorithms (such as those found in TCP) do
# a great job of allowing network endpoints to negotiate transfer rates that
# maximize a link's bandwidth usage without unduly penalizing any particular
# stream. This allows bulk transfer streams to use the maximum available
# bandwidth without affecting the latency of non-bulk (e.g. interactive)
# streams.
# In other words, TCP lets you have your cake and eat it too -- both fast
# downloads and low latency all at the same time.
# However, this only works if TCP's afore-mentioned congestion avoidance
# algorithms actually kick in. The most reliable method of signaling
# congestion is to drop packets. (There are other ways, such as ECN, but
# unfortunately they're still not in wide use.)
# Dropping packets to make the network work better is kinda counter-intuitive.
# But, that's how TCP works. And if you take advantage of that, you can make
# TCP work great.
# Dropping packets gets TCP's attention and fast. The sending endpoint
# throttles back to avoid further network congestion. In other words, your
# fast download slows down. Then, as long as there's no further congestion,
# the sending endpoint gradually increases the transfer rate. Then the cycle
# repeats. It can get a lot more complex than that simple explanation, but the
# main point is: dropping packets when there's congestion is good.
# Traffic shaping is all about slowing down and/or dropping (or ECN marking)
# packets. The thing is, it's much better for latency to simply drop packets
# than it is to slow them down. Linux has a couple of traffic shapers that
# aren't afraid to drop packets. One of the most well-known is TBF, the Token
# Bucket Filter. Normally it slows down packets to a specific rate. But, it
# also accepts a "limit" option to specify the maximum number of packets to
# queue. When the limit is exceeded, packets are dropped.
# TBF's simple "tail-drop" algorithm is actually one of the worst kinds of
# "active queue management" (AQM) that you can do. But even still, it can make
# a huge difference. Applying TBF alone (with a short enough limit) can make a
# maddeningly high-latency link usable again in short order.
# TBF's big disadvantage is that it's a "classless" shaper. That means you
# can't prioritize one TCP stream over another. That's where HTB, the
# Hierarchical Token Bucket, comes in. HTB uses the same general algorithm as
# TBF while also allowing you to filter specific traffic to prioritized queues.
# But HTB has a big weakness: it doesn't have a good, easy way of specifying a
# queue limit like TBF does. That means, compared to TBF, HTB is much more
# inclined to slow packets rather than to drop them. That hurts latency, bad.
# So now we come to Linux traffic shaping's best kept secret: the HFSC shaper.
# HFSC stands for Hierarchical Fair Service Curve. The linux implementation is
# a complex beast, enough so to have a 9 part question about it on serverfault
# ( http://serverfault.com/questions/105014/does-anyone-really-understand-how-hfsc-scheduling-in-linux-bsd-works ).
# Nonetheless, HFSC can be understood in a simplified way as HTB with limits.
# HFSC allows you to classify traffic (like HTB, unlike TBF), but it also has
# no fear of dropping packets (unlike HTB, like TBF).
# HFSC does a great job of keeping latency low. With it, it's possible to fully
# saturate a link while maintaining perfect non-bulk session interactivity.
# It is the holy grail of traffic shaping, and it's in the stock kernel.
# To get the best results, HFSC should be combined with SFQ (Stochastic
# Fairness Queueing) and optionally an ingress filter. If all three are used,
# it's possible to maintain low-latency interactive sessions even without any
# traffic prioritization. Further adding prioritization then maximizes
# interactivity.
# Here's how it's done:
# set this to your internet-facing network interface:
WAN_INTERFACE=eth0
# set this to your local network interface:
LAN_INTERFACE=eth1
# how fast is your downlink?
MAX_DOWNRATE=3072kbit
# how close should we get to max down? e.g. 90%
USE_DOWNPERCENT=0.90
# how fast is your uplink?
MAX_UPRATE=384kbit
# how close should we get to max up? e.g. 80%
USE_UPPERCENT=0.80
# what port do you want to prioritize? e.g. for ssh, use 22
INTERACTIVE_PORT=22
## now for the magic
# remove any existing qdiscs
/sbin/tc qdisc del dev $WAN_INTERFACE root 2> /dev/null
/sbin/tc qdisc del dev $WAN_INTERFACE ingress 2> /dev/null
/sbin/tc qdisc del dev $LAN_INTERFACE root 2> /dev/null
/sbin/tc qdisc del dev $LAN_INTERFACE ingress 2> /dev/null
# computations
MAX_UPNUM=`echo $MAX_UPRATE | sed 's/[^0-9]//g'`
MAX_UPBASE=`echo $MAX_UPRATE | sed 's/[0-9]//g'`
MAX_DOWNNUM=`echo $MAX_DOWNRATE | sed 's/[^0-9]//g'`
MAX_DOWNBASE=`echo $MAX_DOWNRATE | sed 's/[0-9]//g'`
NEAR_MAX_UPNUM=`echo "$MAX_UPNUM * $USE_UPPERCENT" | bc | xargs printf "%.0f"`
NEAR_MAX_UPRATE="${NEAR_MAX_UPNUM}${MAX_UPBASE}"
NEAR_MAX_DOWNNUM=`echo "$MAX_DOWNNUM * $USE_DOWNPERCENT" | bc | xargs printf "%.0f"`
NEAR_MAX_DOWNRATE="${NEAR_MAX_DOWNNUM}${MAX_DOWNBASE}"
HALF_MAXUPNUM=$(( $MAX_UPNUM / 2 ))
HALF_MAXUP="${HALF_MAXUPNUM}${MAX_UPBASE}"
HALF_MAXDOWNNUM=$(( $MAX_DOWNNUM / 2 ))
HALF_MAXDOWN="${HALF_MAXDOWNNUM}${MAX_DOWNBASE}"
# install HFSC under WAN to limit upload
/sbin/tc qdisc add dev $WAN_INTERFACE root handle 1: hfsc default 11
/sbin/tc class add dev $WAN_INTERFACE parent 1: classid 1:1 hfsc sc rate $NEAR_MAX_UPRATE ul rate $NEAR_MAX_UPRATE
/sbin/tc class add dev $WAN_INTERFACE parent 1:1 classid 1:10 hfsc sc umax 1540 dmax 5ms rate $HALF_MAXUP ul rate $NEAR_MAX_UPRATE
/sbin/tc class add dev $WAN_INTERFACE parent 1:1 classid 1:11 hfsc sc umax 1540 dmax 5ms rate $HALF_MAXUP ul rate $HALF_MAXUP
# prioritize interactive ports
/sbin/tc filter add dev $WAN_INTERFACE protocol ip parent 1:0 prio 1 u32 match ip sport $INTERACTIVE_PORT 0xffff flowid 1:10
/sbin/tc filter add dev $WAN_INTERFACE protocol ip parent 1:0 prio 1 u32 match ip dport $INTERACTIVE_PORT 0xffff flowid 1:10
# add SFQ
/sbin/tc qdisc add dev $WAN_INTERFACE parent 1:10 handle 30: sfq perturb 10
/sbin/tc qdisc add dev $WAN_INTERFACE parent 1:11 handle 40: sfq perturb 10
# install ingress filter to limit download to 97% max
MAX_DOWNRATE_INGRESSNUM=`echo "$MAX_DOWNNUM * 0.97" | bc | xargs printf "%.0f"`
MAX_DOWNRATE_INGRESS="${MAX_DOWNRATE_INGRESSNUM}${MAX_DOWNBASE}"
/sbin/tc qdisc add dev $WAN_INTERFACE handle ffff: ingress
/sbin/tc filter add dev $WAN_INTERFACE parent ffff: protocol ip prio 1 u32 match ip sport $INTERACTIVE_PORT 0xffff flowid :1
/sbin/tc filter add dev $WAN_INTERFACE parent ffff: protocol ip prio 1 u32 match ip dport $INTERACTIVE_PORT 0xffff flowid :1
/sbin/tc filter add dev $WAN_INTERFACE parent ffff: protocol ip prio 50 u32 match ip src 0.0.0.0/0 police rate $MAX_DOWNRATE_INGRESS burst 20k drop flowid :2
# install HFSC under LAN to limit download
/sbin/tc qdisc add dev $LAN_INTERFACE root handle 1: hfsc default 11
/sbin/tc class add dev $LAN_INTERFACE parent 1: classid 1:1 hfsc sc rate 1000mbit ul rate 1000mbit
/sbin/tc class add dev $LAN_INTERFACE parent 1:1 classid 1:10 hfsc sc umax 1540 dmax 5ms rate 900mbit ul rate 900mbit
/sbin/tc class add dev $LAN_INTERFACE parent 1:1 classid 1:11 hfsc sc umax 1540 dmax 5ms rate $HALF_MAXDOWN ul rate $NEAR_MAX_DOWNRATE
# prioritize interactive ports
/sbin/tc filter add dev $LAN_INTERFACE protocol ip parent 1:0 prio 1 u32 match ip sport $INTERACTIVE_PORT 0xffff flowid 1:10
/sbin/tc filter add dev $LAN_INTERFACE protocol ip parent 1:0 prio 1 u32 match ip dport $INTERACTIVE_PORT 0xffff flowid 1:10
# add SFQ
/sbin/tc qdisc add dev $LAN_INTERFACE parent 1:10 handle 30: sfq perturb 10
/sbin/tc qdisc add dev $LAN_INTERFACE parent 1:11 handle 40: sfq perturb 10
@eqhmcow
Copy link

eqhmcow commented Apr 4, 2014

@jdeblese - the name of the game is stopping bufferbloat, and the upload buffers are usually the most sensitive to bloat and can really hurt latency when they start to fill. So, we don't want any stream or streams to start pushing anywhere near the max, because they will step over before we can start dropping, and in doing so they will fill buffers and kill interactivity.

We don't have a hard limit on upload like we do with the ingress filter on download, so we need to tell HFSC that we absolutely don't want streams anywhere near max up, so it has time to recognize streams that are trying to get there and squelch them.

That said, if you find different numbers work better for you, go for it! Try it and see :)

Also, a newer version of this script is at https://gist.github.com/eqhmcow/939373 . It has some better integration with iptables and prioritizes ACKs and other small traffic.

@eqhmcow
Copy link

eqhmcow commented Apr 4, 2014

@dtaht - You're right that SFQ does wonderful things. And, I don't claim to understand HFSC except in the most basic way. However, I have observed (with watch -d -- '/sbin/tc -s qdisc' ) that HFSC does indeed drop packets; and dropping packets is the goal!

fq_codel may indeed be better; I haven't tried it.

Finally -- if you're having to use AQM and you're not doing core internet routing, the best answer is always "buy more bandwidth" ! If you can eliminate the buffer by eliminating bottlenecks (rather than enforcing them, as this script does), you eliminate bufferbloat :)

@JohnBruce
Copy link

is the only way to implement this in download context system wide? Ie. can the ingress filtering be applied on a user by user level rather than just limiting all user traffic?

@eqhmcow
Copy link

eqhmcow commented Aug 28, 2015

You can use iptables to mark packets for / not for particular users and then drop them into different tc classids

@RubenKelevra
Copy link

Thanks nice start for own scripts. :)

@eqhmcow
Copy link

eqhmcow commented Dec 27, 2016

merry Christmas! yes, I'm still using this script and yes, it still works great. (I'm the original author, but this fork ranks higher in google for some reason)

I've updated the script to match my latest port prioritizations, but the core is still the same and still kicks latency's butt.

see https://gist.github.com/eqhmcow/939373 for the latest version

@ScottRosenberg2
Copy link

"Finally -- if you're having to use AQM and you're not doing core internet routing, the best answer is always "buy more bandwidth" ! If you can eliminate the buffer by eliminating bottlenecks (rather than enforcing them, as this script does), you eliminate bufferbloat :)"

That's not really true.

There will always be a bigger fish.

You can have bufferbloat on gigabit symmetrical connections.

An example: Steam

Steam will consume any amount of bandwidth you throw at it, for the duration of a game download, which can exceed 50 gigs at times.

It'll throw out 20+ connections that will consume anything you throw at it.

Fq_Codel will limit the queue length intelligently allowing new traffic to go through without pain.

Currently: I have htb with one up queue, and one down queue, with fq_codel, and I don't need any form of prioritization, as it sanely handles my bandwidth and buffers.

You mentioned hfsc drops packets when congestion happens, and htb doesn't

fq_codel, which can work with htb, drops packets when congestion happens, so that benefit in favor of hfsc is lost

@eqhmcow
Copy link

eqhmcow commented Jun 17, 2017

@moscato - "lost" is a strong word

HFSC rules let you specify what you think your max upload or download is, rather than trust fq_codel to find them for you.

In my anecdotal testing, I cannot play low-latency games with only fq_codel prioritizing the link; I can easily game with my HFSC script turned on. To be fair, I haven't tried htb + fq_codel.

To be honest I do find it rather odd the number of people who advocate for fq_codel; I imagine a no-knobs solution to be less optimal than one where you set up your limits up front, and then enforce them. But, to each their own; I have no faith in fq_codel, but I'm perfectly happy if other people have found it useful.

@karel-un
Copy link

@eqhmcow - I think you confuse work conserving and work non-conserving schedulers purpose.

fq_codel is work conserving scheduler. It will not work right if "narrow" point will be somewhere it is not directly attached to. And even if it is so, it will not prioritize, but divide bandwidth proportionally to flows and maintain low latency in flows. Of course it will not allow you to play games while you saturate line with downloads (because of packet dropping in buffer of flow of given game). So easy solution is to attach it to work non-conservig scheduler (HFSC or HTB) leaves. And prioritize traffic in several classes of this work non-conserving scheduler (as you do).

fq_codel is not ideal (and to explain why would take much more time I am willing to sacrifice here), but it is currently the best Linux kernel offer. The main selling point of fq_codel is not "no knobs solution", but actually proportional dropping of packets in flows based on increasing delay of packets in given flow over some limit (5ms) combined with setting ECN bits (extremly helpful, can be disabled).

And notice to SFQ: NEVER use SFQ with perturb! Or better never use SFQ at all when much better fq_codel exists. When changing hashing function occures in regular "perturb" intervals, there is a chance that packets in flows get reordered, which is BIG NO NO.

@robinsmidsrod
Copy link

@eqhmcow @bradoaks Thanks a bunch for these scripts! They pointed me in the direction of HFSC and fq_codel, which are indeed great schedulers.

I updated my SuperShaper-SOHO solution to use HFSC and fq_codel based on your scripts above and some additional sources. I even wrote a blog post about this transition from HTB/SFQ to HFSC/fq_codel.

One of the benefits of my solution based on your solution is that mine only uses the ul service curve on the root class, allowing any flow to borrow from each other when the link is not fully saturated. I also don't use the rt service curve, because I couldn't understand the math involved. But just using link-sharing (ls) is still, in my opinion, much better than using HTB and SFQ. My latency is now controlled much better. Thanks a lot!

@eqhmcow
Copy link

eqhmcow commented Dec 1, 2020

I'm glad this has helped people over the years!

However, this script is now obsolete. Use tc_cake instead. It really is amazing, and the bufferbloat people deserve an enormous amount of kudos for creating it and getting it into mainline linux. Thanks also to the Red Hat linux company for employing some of those people.

Cake is dead simple to use:

WAN=eth1
LAN=eth0

/usr/sbin/tc qdisc del dev $WAN root 2> /dev/null
/usr/sbin/tc qdisc del dev $LAN root 2> /dev/null

BANDWIDTH=6mbit
UPBANDWIDTH=3mbit
/usr/sbin/tc qdisc add dev $WAN handle 1: root cake besteffort bandwidth $UPBANDWIDTH internet nat egress ack-filter dual-srchost ethernet
/usr/sbin/tc qdisc add dev $LAN handle 1: root cake besteffort bandwidth $BANDWIDTH internet ingress dual-dsthost ethernet

@signalfrax
Copy link

signalfrax commented Dec 14, 2020

@eqhmcow Can you help me out with a full script that does the following?

  • Ensure that the bandwidth is shared equally across devices in a network
  • Eliminate/reduce bufferbloat.

I'm using a Debian machine as my router for a DSL connection at home.
Here is the script that I copied from the internet that does the first point.

#!/bin/bash
##
# Dan Siemon <dan@coverfire.com>
# http://www.coverfire.com
#
# License: Affero GPLv3
#
# This script is designed to be used on router placed before a bottleneck link.
# Set the rate to be slightly below the bottleneck rate so that the router
# owns the queue. That is, there is no queueing in the DSL or cable device.
#
# This script attempts to create per-host fairness on the network
# and for each host three priority classes. Per-host fairness is created
# by having NUM_HOST_BUCKETS classes and hashing hosts across them. Set
# NUM_HOST_BUCKETS to something sane for your network.
#
# Experimental results can be found at:
# https://www.coverfire.com/archives/2013/01/01/improving-my-home-internet-performance/
#
# The hierarchy looks like:
#
# ASCII:
#
#                           Interface
#                |
#                 HTB 1:1
#                 /     \
#            Host Bucket 1  .. NUM_HOST_BUCKETS [Classes 1:10-1:(10+NUM_HOST_BUCKETS)]
#                        |
#                       DRR
#            /    |    \
#         High Normal Low [DRR: With three classes]
#            |
#                Leaf QDisc [Choose the type of the leaf QDisc below]
#
# The tree is created and the QDiscs are named in depth first order.
#
# TODO
# - Add IPv6 support. Should just require additional filters. Note that the flow filter
#   automatically reaches through some tunnels like IP-IPv6 so if you are using IPv6 via
#   a tunnel this script should already have the correct behavior.
#
######################
# Config
######################

/usr/bin/logger --tag qos --id "Applying QOS on PPP connection..."

TC="/sbin/tc"
#TC=`which tc`

#_DEBUG="on"
#_CDEBUG="on"

DEVICE="ppp0"

# The number of host buckets. All hosts are hashed into one of these buckets
# so you'll want this to approximate (but probably be lower) the number of hosts
# in your network.
NUM_HOST_BUCKETS=8

# The number of flow buckets within each high, normal and low class.
# If SFQ, SFB or FQ_CODEL are used this value is not used as these QDiscs
# have many embedded queues.
NUM_FLOW_BUCKETS=32

####
# Bandwidth rates
####
# All rates are kbit/sec.
# RATE should be set to just under your link rate.
RATE="32999"


####
# Queue size
####
# Size the queue. Only used with the simple FIFO QDiscs
# ie not SFQ, FQ_CODEL. Fun for experimentation but you
# probably don't want to use these simple QDiscs.
FIFO_LEN=100

####
# How often to perturb the hashes.
####
# This should probably be on the order of minutes so as to avoid the packet
# reordering which can happen when the flows are redistributed
# into different queues. Some of the new QDiscs may handle reordering properly.
#PERTURB=5
PERTURB=300

####
# Packet overhead
####
# Examples:
#   ADSL:
#    - http://www.adsl-optimizer.dk/thesis/
#    (http://web.archive.org/web/20090422131547/http://www.adsl-optimizer.dk/thesis/)
#    - If you are using ADSL you probably want LINKLAYER="atm" too.
#   VDSL2 (without ATM) w/ PPPoE:
#    - 40 bytes for 802.3
#    - 8 bytes for PPPoE
OVERHEAD=48

####
# Set linklayer to one of ethernet,adsl (adsl == atm).
####
#LINKLAYER="adsl"
LINKLAYER="ethernet"

####
# The MTU of the underlying interface.
####
MTU="1492"

####
# The keys that are used to identify individual flows.
####
# For 5-tuple (flow) fairness
#FLOW_KEYS="src,dst,proto,proto-src,proto-dst"
# For 5-tuple (flow) fairness when the same device is performing NAT
FLOW_KEYS="nfct-src,nfct-dst,nfct-proto,nfct-proto-src,nfct-proto-dst"

####
# The keys that are used to identify a host's traffic.
####
# No NAT
#HOST_KEYS="src"
# With local device doing NAT
HOST_KEYS="nfct-src"

# Set R2Q (HTB knob) low if you use low bitrates. You may see warning from the kernel
# in /var/log/messages indicating this value should be modified. If you set the
# MTU/QUANTUM changing this isn't required.
#R2Q=2

####
# Choose the type of queue for each of the three per host priority classes
# Support options:
#       drr
#       sfq
#       fq_codel
#       sfb
#       pfifo_head_drop
#       pfifo
####
HIGH_PRIORITY_QDISC_TYPE="fq_codel"
NORMAL_PRIORITY_QDISC_TYPE="fq_codel"
LOW_PRIORITY_QDISC_TYPE="fq_codel"

###########################################
###########################################
# Other than picking QDisc type there is nothing to change below here.
###########################################
###########################################

######################
# Expand the config variables to tc arguments if they are defined.
######################
if [ "${OVERHEAD}" != "" ]; then
    OVERHEAD="overhead ${OVERHEAD}"
fi

if [ "${LINKLAYER}" != "" ]; then
    LINKLAYER="linklayer ${LINKLAYER}"
fi

if [ "${R2Q}" != "" ]; then
    R2Q="r2q ${R2Q}"
fi

if [ "${PERTURB}" != "" ]; then
    PERTURB="perturb ${PERTURB}"
fi

QUANTUM=${MTU}
if [ "${QUANTUM}" != "" ]; then
    QUANTUM="quantum ${QUANTUM}"
fi

######################
# Utility functions
######################

function DEBUG()
{
    [ "$_DEBUG" == "on" ] && "$@"
}

# Debug function for printing the tc command lines.
function CDEBUG()
{
    [ "$_CDEBUG" == "on" ] && "$@"
}

function hex_replace {
    if [[ "$1" =~ ":" ]]; then
        QDISC=${1%%:*}
        CLASS=${1##*:}

        if [ "${CLASS}" == "" ]; then
            D2H=`printf "%x:" ${QDISC}`
        else
            D2H=`printf "%x:%x" ${QDISC} ${CLASS}`
        fi
    else
        D2H=`printf "%x" $1`
    fi
}

###
# Function to wrap the tc command and convert the QDisc and class
# identifiers to hex before calling tc.
###
function tc_h {
    OUTPUT="${TC} "

    PTMP=$@
    CDEBUG printf "Command before: %s\n" "${PTMP}"

    while [ "$1" != "" ]; do
        case "$1" in
                        # The tc parameters which take major:minor as an argument
            "classid" | "flowid" | "parent" | "baseclass" | "handle")
                hex_replace $2

                OUTPUT="${OUTPUT} $1 ${D2H} "
                shift
                ;;
            * )
                OUTPUT="${OUTPUT} $1 "
        esac

        shift
    done

    CDEBUG printf "Command after: ${OUTPUT}\n"

        # Run the command.
    ${OUTPUT}
}

function get_next_free_major {
        if [ "${FREE_MAJOR}" == "" ]; then
                FREE_MAJOR=2 # Assumes 1 is used.

                return
        fi

        FREE_MAJOR=$(expr ${FREE_MAJOR} + 1)
}

######################
# Functions to create QDiscs at the leaves.
######################

function drr {
    PARENT=$1
    HANDLE=$2

    # Create the QDisc.
    tc_h qdisc add dev ${DEVICE} parent ${PARENT} handle ${HANDLE} drr

    # Create NUM_FLOW_BUCKETS classes and add a pfifo_head_drop to each.
    for J in `seq ${NUM_FLOW_BUCKETS}`; do
        tc_h class add dev ${DEVICE} parent ${HANDLE} classid ${HANDLE}:${J} drr ${QUANTUM}
        tc_h qdisc add dev ${DEVICE} parent ${HANDLE}:${J} pfifo_head_drop limit ${FIFO_LEN}
    done

    # Add a filter to direct the packets.
    tc_h filter add dev ${DEVICE} prio 1 protocol ip parent ${HANDLE}: handle 1 flow hash keys ${FLOW_KEYS} divisor ${NUM_FLOW_BUCKETS} ${PERTURB} baseclass ${HANDLE}:1
}

function sfq {
    PARENT=$1
    HANDLE=$2
    DEBUG printf "\t\t\tsfq parent %s handle %s\n" ${PARENT} ${HANDLE}

    #tc_h qdisc add dev ${DEVICE} parent ${PARENT} handle ${HANDLE} sfq limit ${FIFO_LEN} ${QUANTUM} divisor 1024
    tc_h qdisc add dev ${DEVICE} parent ${PARENT} handle ${HANDLE} sfq ${QUANTUM} divisor 1024

    # Don't use the SFQ default classifier.
    tc_h filter add dev ${DEVICE} prio 1 protocol ip parent ${HANDLE}: handle 1 flow hash keys ${FLOW_KEYS} divisor 1024 ${PERTURB} baseclass ${HANDLE}:1
}

function fq_codel {
    PARENT=$1
    HANDLE=$2
    DEBUG printf "\t\t\tfq_codel parent %s handle %s\n" ${PARENT} ${HANDLE}

    tc_h qdisc add dev ${DEVICE} parent ${PARENT} handle ${HANDLE} fq_codel ${QUANTUM} flows 4096

    # Don't use the default classifier.
    tc_h filter add dev ${DEVICE} prio 1 protocol ip parent ${HANDLE}: handle 1 flow hash keys ${FLOW_KEYS} divisor 4096 ${PERTURB} baseclass ${HANDLE}:1
}

function sfb {
    PARENT=$1
    HANDLE=$2

    #tc_h qdisc add dev ${DEVICE} parent ${PARENT} handle ${HANDLE} sfb
    tc_h qdisc add dev ${DEVICE} parent ${PARENT} handle ${HANDLE} sfb target 20 max 25 increment 0.005 decrement 0.0001

    # TODO - Should this have divisor?
    tc_h filter add dev ${DEVICE} prio 1 protocol ip parent ${HANDLE}: handle 1 flow hash keys ${FLOW_KEYS} divisor 1024 ${PERTURB}
}

function pfifo_head_drop {
    PARENT=$1
    HANDLE=$2

    tc_h qdisc add dev ${DEVICE} parent ${PARENT} handle ${HANDLE} pfifo_head_drop limit ${FIFO_LEN}
}

function pfifo {
    PARENT=$1
    HANDLE=$2

    tc_h qdisc add dev ${DEVICE} parent ${PARENT} handle ${HANDLE} pfifo limit ${FIFO_LEN}
}

function priority_class_qdisc {
    PARENT=$2
    HANDLE=$3

        case "$1" in
                "drr" )
                        drr ${PARENT} ${HANDLE}
                        ;;
                "sfq" )
                        sfq ${PARENT} ${HANDLE}
                        ;;
                "fq_codel" )
                        fq_codel ${PARENT} ${HANDLE}
                        ;;
                "sfb" )
                        sfb ${PARENT} ${HANDLE}
                        ;;
                "pfifo_head_drop" )
                        pfifo_head_drop ${PARENT} ${HANDLE}
                        ;;
                "pfifo" )
                        pfifo ${PARENT} ${HANDLE}
                        ;;
                * )
                        echo "Error: Unknown leaf QDisc type"
                        exit
                        ;;
        esac
}

######################
# The real work starts here.
######################

# Calculate the divided rate value for use later.
DIV_RATE=`expr ${RATE} / ${NUM_HOST_BUCKETS}`

echo "Number of host buckets: ${NUM_HOST_BUCKETS}"
echo "Rate per host (DIV_RATE):" ${DIV_RATE}

# Delete any existing QDiscs if they exist.
tc_h qdisc del dev ${DEVICE} root

# HTB QDisc at the root. Default all traffic into the prio qdisc.
tc_h qdisc add dev ${DEVICE} root handle 1: htb ${R2Q}

# Create a top level class with the max rate.
tc_h class add dev ${DEVICE} parent 1: classid 1:1 htb rate ${RATE}kbit ${QUANTUM} prio 0 ${LINKLAYER} ${OVERHEAD}

######
# Create NUM_HOST_BUCKETS classes within the top-level class.
# Within each of these create a DRR with three classes which implement the three priorities.
# Within each priority class create the configured leaf QDisc.
######
for HOST_NUM in `seq ${NUM_HOST_BUCKETS}`; do
    DEBUG printf "Create host class: %i\n" $HOST_NUM

    QID=`expr ${HOST_NUM} '+' 9` # 1+9=10 - Start host buckets at 10. Arbitrary.
    DEBUG printf "\tQID: %i\n" ${QID}
    tc_h class add dev ${DEVICE} parent 1:1 classid 1:${QID} htb rate ${DIV_RATE}kbit ceil ${RATE}kbit ${QUANTUM} prio 0 ${LINKLAYER} ${OVERHEAD}


    ######
        # Within each host class create a DRR QDisc within which we'll create the
        # high, normal and low priority classes.
    ######
        get_next_free_major
        SUB_MAJOR=${FREE_MAJOR}
        tc_h qdisc add dev ${DEVICE} parent 1:${QID} handle ${SUB_MAJOR}: drr

        # Filter from the host class to the DRR within it.
        tc_h filter add dev ${DEVICE} prio 2 protocol ip parent 1:${QID} u32 match ip dst 0.0.0.0/0 flowid ${SUB_MAJOR}:0


    ###
    # High priority class
    ###
    DEBUG printf "\t\tHigh: %i\n" ${QID_1}
        tc_h class add dev ${DEVICE} parent ${SUB_MAJOR}: classid ${SUB_MAJOR}:1 drr ${QUANTUM}

    # Create the leaf QDisc for this priority class.
        get_next_free_major
        SUB_PRIO_MAJOR=${FREE_MAJOR}
        priority_class_qdisc ${HIGH_PRIORITY_QDISC_TYPE} ${SUB_MAJOR}:1 ${SUB_PRIO_MAJOR}

    ###
    # Normal priority class
    ###
    DEBUG printf "\t\tNormal: %i\n" ${QID_2}
        tc_h class add dev ${DEVICE} parent ${SUB_MAJOR}: classid ${SUB_MAJOR}:2 drr ${QUANTUM}

    # Create the leaf QDisc for this priority class.
        get_next_free_major
        SUB_PRIO_MAJOR=${FREE_MAJOR}
        priority_class_qdisc ${NORMAL_PRIORITY_QDISC_TYPE} ${SUB_MAJOR}:2 ${SUB_PRIO_MAJOR}

    ###
    # Low priority class
    ###
    DEBUG printf "\t\tLow: %i\n" ${QID_3}
        tc_h class add dev ${DEVICE} parent ${SUB_MAJOR}: classid ${SUB_MAJOR}:3 drr ${QUANTUM}

    # Create the leaf QDisc for this priority class.
        get_next_free_major
        SUB_PRIO_MAJOR=${FREE_MAJOR}
        priority_class_qdisc ${LOW_PRIORITY_QDISC_TYPE} ${SUB_MAJOR}:3 ${SUB_PRIO_MAJOR}


    ######
    # Add filters to classify based on the TOS bits into the high, normal and low priority classes.
    # Only mask against the three (used) TOS bits. The final two bits are used for ECN.
    # TOS field is XXXDTRXX.
    # X= Not part of the TOS field.
    # D= Delay bit
    # T= Throughput bit
    # R= Reliability bit
    #
    # OpenSSH terminal sets D.
    # OpenSSH SCP/SFTP sets T.
    # It's easy to configure the Transmission Bittorrent client to set T (settings.json).
    # For home VoIP devices I use an iptables rule to set all of their traffic to have D.
    #
    # The thinking behind the below rules is to use D as an indication of delay sensitive
    # and T as an indication of background (big transfer). All other combinations are put into
    # default which is effectively a medium priority.
    ######
    DEBUG printf "\t\tCreating filters\n"

    # D bit set.
    tc_h filter add dev ${DEVICE} parent ${SUB_MAJOR}: protocol ip prio 10 u32 match ip tos 0x10 0x1c flowid ${SUB_MAJOR}:1

    # Diffserv expedited forwarding. Put this in the high priority class.
    # Some VoIP clients set this (ie Ekiga).
    # DSCP=b8
    tc_h filter add dev ${DEVICE} parent ${SUB_MAJOR}: protocol ip prio 10 u32 match ip tos 0xb8 0xfc flowid ${SUB_MAJOR}:1

    # T bit set.
    tc_h filter add dev ${DEVICE} parent ${SUB_MAJOR}: protocol ip prio 10 u32 match ip tos 0x08 0x1c flowid ${SUB_MAJOR}:3

    # Everything else into default.
    tc_h filter add dev ${DEVICE} parent ${SUB_MAJOR}: protocol ip prio 10 u32 match ip tos 0x00 0x00 flowid ${SUB_MAJOR}:2
done

# Send everything that hits the top level QDisc to the top class.
tc_h filter add dev ${DEVICE} prio 1 protocol ip parent 1:0 u32 match ip dst 0.0.0.0/0 flowid 1:1

# From the top level class hash into the host classes.
tc_h filter add dev ${DEVICE} prio 1 protocol ip parent 1:1 handle 1 flow hash keys ${HOST_KEYS} divisor ${NUM_HOST_BUCKETS} ${PERTURB} baseclass 1:10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment