Skip to content

Instantly share code, notes, and snippets.

@fschiettecatte
Last active January 22, 2024 21:31
Show Gist options
  • Star 15 You must be signed in to star a gist
  • Fork 9 You must be signed in to fork a gist
  • Save fschiettecatte/084b05669cffbf05efcddaf9b10eef23 to your computer and use it in GitHub Desktop.
Save fschiettecatte/084b05669cffbf05efcddaf9b10eef23 to your computer and use it in GitHub Desktop.
High Traffic Server Settings on RHEL / AlmaLinux / Rocky / EuroLinux / CentOS 8

High Traffic Server Settings on RHEL / AlmaLinux / Rocky / EuroLinux / CentOS 8

I recently did some work to optimize the network configuration of an AlmaLinux 8 based web server that receives a lot of traffic.

Of course these settings also apply to RHEL / Rocky / EuroLinux / CentOS 8 (hereafter referred to as Linux 8.) I think these should also work on RHEL / AlmaLinux / Rocky / EuroLinux 9 as well but I have not yet tested them.

There is a lot of information on the web for this and it distills down to a minimum recommended configuration, and a recommended configuration.

The minimum recommended configuration should be sufficient for servers with less than 10Gb, and the recommended configuration should be sufficient for servers with 10Gb or more.

Minimum recommended configuration:

# Google's BBR congestion control algorithm (default: cubic)
# https://research.google/pubs/pub45646/
sysctl -w net.ipv4.tcp_congestion_control=bbr

# BBR requires fq for queue management (default: fq_codel)
sysctl -w net.core.default_qdisc=fq

# Time default value for tcp_fin_timeout connection (default: 60)
sysctl -w net.ipv4.tcp_fin_timeout=30

# Disable the gradual speed increase, useful on variable-speed WANs but not for us (default: 1)
sysctl -w net.ipv4.tcp_slow_start_after_idle=0

# Avoid MTU black holes (default: 0)
sysctl -w net.ipv4.tcp_mtu_probing=1

You can check the current/default values before making any changes:

sysctl net.ipv4.tcp_congestion_control
sysctl net.core.default_qdisc
sysctl net.ipv4.tcp_fin_timeout
sysctl net.ipv4.tcp_slow_start_after_idle
sysctl net.ipv4.tcp_mtu_probing

Defaults on Linux 8 are as follows:

sysctl -w net.ipv4.tcp_congestion_control=cubic
sysctl -w net.core.default_qdisc=fq_codel
sysctl -w net.ipv4.tcp_fin_timeout=60
sysctl -w net.ipv4.tcp_slow_start_after_idle=1
sysctl -w net.ipv4.tcp_mtu_probing=0

For the changes to survive between reboots, add the following to /etc/sysctl.d/99-linux.internal.conf

# Google's BBR congestion control algorithm (default: cubic)
# https://research.google/pubs/pub45646/
net.ipv4.tcp_congestion_control = bbr

# BBR requires fq for queue management (default: fq_codel)
net.core.default_qdisc = fq

# Time default value for tcp_fin_timeout connection (default: 60)
net.ipv4.tcp_fin_timeout = 30

# Disable the gradual speed increase, useful on variable-speed WANs but not for us (default: 1)
net.ipv4.tcp_slow_start_after_idle = 0

# Avoid MTU black holes (default: 0)
net.ipv4.tcp_mtu_probing = 1

Once added to /etc/sysctl.d/99-linux.internal.conf, you can reload the settings as follows:

sysctl --load=/etc/sysctl.d/99-linux.internal.conf

Recommended configuration:

# Maximum number of packets queued on the input side (default: 1000)
sysctl -w net.core.netdev_max_backlog=30000

# Maximum receive socket buffer size (default: 212992)
sysctl -w net.core.rmem_max=134217728

# Maximum send socket buffer size (default: 212992)
sysctl -w net.core.wmem_max=134217728

# Minimum, initial and max TCP receive buffer size in bytes (default: 4096 87380 6291456)
sysctl -w net.ipv4.tcp_rmem="10240 87380 134217728"

# Minimum, initial and max TCP send buffer size in bytes (default: 4096 20480 4194304)
sysctl -w net.ipv4.tcp_wmem="4096 87380 134217728"

# Allowed local port range (default: 32768 60999)
sysctl -w net.ipv4.ip_local_port_range=1024 65535

# Google's BBR congestion control algorithm (default: cubic)
# https://research.google/pubs/pub45646/
sysctl -w net.ipv4.tcp_congestion_control = bbr

# BBR requires fq for queue management (default: fq_codel)
sysctl -w net.core.default_qdisc = fq

# Time default value for tcp_fin_timeout connection (default: 60)
sysctl -w net.ipv4.tcp_fin_timeout=30

# Maximum SYN backlog (default: 1024)
sysctl -w net.ipv4.tcp_max_syn_backlog=8096

# Disable the gradual speed increase,  useful on variable-speed WANs but not for us (default: 1)
sysctl -w net.ipv4.tcp_slow_start_after_idle=0

# Don't cache ssthresh from previous connection (default: 0)
sysctl -w net.ipv4.tcp_no_metrics_save=1

# Avoid MTU black holes (default: 0)
sysctl -w net.ipv4.tcp_mtu_probing=1

You can check the current/default values before making any changes:

sysctl net.core.netdev_max_backlog
sysctl net.core.rmem_max
sysctl net.core.wmem_max
sysctl net.ipv4.tcp_rmem
sysctl net.ipv4.tcp_wmem
sysctl net.ipv4.ip_local_port_range
sysctl net.ipv4.tcp_congestion_control
sysctl net.core.default_qdisc
sysctl net.ipv4.tcp_fin_timeout
sysctl net.ipv4.tcp_max_syn_backlog
sysctl net.ipv4.tcp_slow_start_after_idle
sysctl net.ipv4.tcp_no_metrics_save
sysctl net.ipv4.tcp_mtu_probing

Defaults on Linux 8 are as follows:

sysctl -w net.core.netdev_max_backlog=1000
sysctl -w net.core.rmem_max=212992
sysctl -w net.core.wmem_max=212992
sysctl -w net.ipv4.tcp_rmem=4096 87380 6291456
sysctl -w net.ipv4.tcp_wmem=4096 87380 6291456
sysctl -w net.ipv4.ip_local_port_range=32768 60999
sysctl -w net.ipv4.tcp_congestion_control=cubic
sysctl -w net.core.default_qdisc=fq_codel
sysctl -w net.ipv4.tcp_fin_timeout=60
sysctl -w net.ipv4.tcp_max_syn_backlog=80102496
sysctl -w net.ipv4.tcp_slow_start_after_idle=1
sysctl -w net.ipv4.tcp_no_metrics_save=0
sysctl -w net.ipv4.tcp_mtu_probing=0

For the changes to survive between reboots, add the following to /etc/sysctl.d/99-linux.internal.conf

# Maximum number of packets queued on the input side (default: 1000)
net.core.netdev_max_backlog = 30000

# Maximum receive socket buffer size (default: 212992)
net.core.rmem_max = 134217728

# Maximum send socket buffer size (default: 212992)
net.core.wmem_max = 134217728

# Minimum, initial and max TCP receive buffer size in bytes (default: 4096 87380 6291456)
net.ipv4.tcp_rmem = 4096 87380 134217728

# Minimum, initial and max TCP send buffer size in bytes (default: 4096 20480 4194304)
net.ipv4.tcp_wmem = 4096 87380 134217728

# Allowed local port range (default: 32768 60999)
net.ipv4.ip_local_port_range = 1024 65535

# Google's BBR congestion control algorithm (default: cubic)
# https://research.google/pubs/pub45646/
net.ipv4.tcp_congestion_control = bbr

# BBR requires fq for queue management (default: fq_codel)
net.core.default_qdisc = fq

# Time default value for tcp_fin_timeout connection (default: 60)
net.ipv4.tcp_fin_timeout = 30

# Maximum SYN backlog (default: 1024)
net.ipv4.tcp_max_syn_backlog = 8096

# Disable the gradual speed increase,  useful on variable-speed WANs but not for us (default: 1)
net.ipv4.tcp_slow_start_after_idle = 0

# Don't cache ssthresh from previous connection (default: 0)
net.ipv4.tcp_no_metrics_save = 1

# Avoid MTU black holes (default: 0)
net.ipv4.tcp_mtu_probing = 1

Once added to /etc/sysctl.d/99-linux.internal.conf, you can reload the settings as follows:

sysctl --load=/etc/sysctl.d/99-linux.internal.conf

Adding BBR to the available congestion control algorithms

By default BBR is not enabled on Linux 8, you can check what is currently enabled as follows:

[root@linux]# sysctl net.ipv4.tcp_available_congestion_control
net.ipv4.tcp_available_congestion_control = reno cubic

You can list all the modules present on your system as follows:

[root@linux]# ls -la /lib/modules/$(uname -r)/kernel/net/ipv4/
.
tcp_bbr.ko.xz
.

The module you are looking for is tcp_bbr.ko.xz.

Enable it with modprobe:

[root@linux]# modprobe -v -a tcp_bbr
insmod /lib/modules/4.18.0-193.14.2.el8_2.x86_64/kernel/net/ipv4/tcp_bbr.ko.xz

You can check for it as follows:

[root@linux]# sysctl net.ipv4.tcp_available_congestion_control
net.ipv4.tcp_available_congestion_control = reno cubic bbr
@orachimary
Copy link

Are these settings suitable for Linux acting as a router with a 10g card? I am experiencing slow speeds with a 12ms delay between such routers.

@fschiettecatte
Copy link
Author

I have never tried that, I guess you should try and create a gist with your configuration and results.

@yaser-k
Copy link

yaser-k commented Jul 24, 2023

Hi, Thank you for sharing your great work.
I am trying to tune a web server and beside your config found this 2 good article. I wonder If I could add anything from these to your config without break something.

https://crossbox.io/documentation/page/performance-tuning

https://blog.cloudflare.com/http-2-prioritization-with-nginx/

@fschiettecatte
Copy link
Author

Hi,

The CrossBox article has pretty much the same advice as is in the above gist, they have suggestions for open files and sockets which I feel are out of scope of this gist. The suggestion about swappiness is a good one, I usually run my servers with this configuration:

# Minimize swappiness
vm.swappiness = 1

# Cap memory commitment
vm.overcommit_memory = 2
vm.overcommit_ratio = 100

The Cloudflare article is focused more on HTTP/2.0 and how tcp_notsent_lowat can help with that. You would need to experiment with that to see how it best works with your application.

I also have another gist which talks about settings for a 10Gb NIC which might be helpful if you are using those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment