-
-
Save voluntas/bc54c60aaa7ad6856e6f6a928b79ab6c to your computer and use it in GitHub Desktop.
### KERNEL TUNING ### | |
# Increase size of file handles and inode cache | |
fs.file-max = 2097152 | |
# Do less swapping | |
vm.swappiness = 10 | |
vm.dirty_ratio = 60 | |
vm.dirty_background_ratio = 2 | |
# Sets the time before the kernel considers migrating a proccess to another core | |
kernel.sched_migration_cost_ns = 5000000 | |
# Group tasks by TTY | |
#kernel.sched_autogroup_enabled = 0 | |
### GENERAL NETWORK SECURITY OPTIONS ### | |
# Number of times SYNACKs for passive TCP connection. | |
net.ipv4.tcp_synack_retries = 2 | |
# Allowed local port range | |
net.ipv4.ip_local_port_range = 2000 65535 | |
# Protect Against TCP Time-Wait | |
net.ipv4.tcp_rfc1337 = 1 | |
# Control Syncookies | |
net.ipv4.tcp_syncookies = 1 | |
# Decrease the time default value for tcp_fin_timeout connection | |
net.ipv4.tcp_fin_timeout = 15 | |
# Decrease the time default value for connections to keep alive | |
net.ipv4.tcp_keepalive_time = 300 | |
net.ipv4.tcp_keepalive_probes = 5 | |
net.ipv4.tcp_keepalive_intvl = 15 | |
### TUNING NETWORK PERFORMANCE ### | |
# Default Socket Receive Buffer | |
net.core.rmem_default = 31457280 | |
# Maximum Socket Receive Buffer | |
net.core.rmem_max = 33554432 | |
# Default Socket Send Buffer | |
net.core.wmem_default = 31457280 | |
# Maximum Socket Send Buffer | |
net.core.wmem_max = 33554432 | |
# Increase number of incoming connections | |
net.core.somaxconn = 65535 | |
# Increase number of incoming connections backlog | |
net.core.netdev_max_backlog = 65536 | |
# Increase the maximum amount of option memory buffers | |
net.core.optmem_max = 25165824 | |
# Increase the maximum total buffer-space allocatable | |
# This is measured in units of pages (4096 bytes) | |
net.ipv4.tcp_mem = 786432 1048576 26777216 | |
net.ipv4.udp_mem = 65536 131072 262144 | |
# Increase the read-buffer space allocatable | |
net.ipv4.tcp_rmem = 8192 87380 33554432 | |
net.ipv4.udp_rmem_min = 16384 | |
# Increase the write-buffer-space allocatable | |
net.ipv4.tcp_wmem = 8192 65536 33554432 | |
net.ipv4.udp_wmem_min = 16384 | |
# Increase the tcp-time-wait buckets pool size to prevent simple DOS attacks | |
net.ipv4.tcp_max_tw_buckets = 1440000 | |
net.ipv4.tcp_tw_recycle = 1 | |
net.ipv4.tcp_tw_reuse = 1 |
net.core.wmem_default = 31457280
Ugh. Try that with a fast Ethernet card, and see what happens on the receiving end... I think it will kill the receiver's ring buffer at 1 Gbps already. I would not go over 128K on the sending side of a fast link, not at 10G for sure.
net.core.somaxconn = 65535
So, the machine handles 64K clients, and is buffering up to 32MB for each on its output. 2 TB of write buffers. That's a lot of RAM and cores to support this, and a very, very large stack of 50Gbps cards. What is the point of handling this huge workload in a monstrous machine? Probably doable with a non-trivial PCIe topology, and throw in OmniPath interconnects, bit it still seems like a couple of 42U racks of smaller servers could do the same job with a much better redundancy at a lower cost, given that even 50GbE's are uncommon and expensive, and you do not build non-uniform PCIe topologies in your garage. Without an explanation why, this looks... well, quite an unusual configuration.
To everyone directly copying this config file, the option net.ipv4.tcp_tw_recycle = 1 will very likely cause major issues and strange network behavior in virtualized environments / load balancers / firewalls where the observed behavior will be random stalls up to 15 seconds and during a packet capture packets will simply "disappear"
Refer to https://www.speedguide.net/articles/linux-tweaking-121 to see if this setting is recommended/needed for you as there is no clear performance benefit when using this setting.
tw_recycle is useful in virtualized environments if combination with its program that designed for reuse.
comments here are helpless but confusing people.
Why don't you provide the system parameters that you have used this configuration for, so it is better
comments here are helpless but confusing people.
The fact is, tweaking kernel parameters has nuanced effects on your system. Everything has a cost-benefit tradeoff.
Copy-pasting someone else's sysctl settings without understanding the implications will often make one's performance worse -- even if it worked well for the person who posted the config. There isn't a good one-sized-fits-all "performance" config.
All that said, there are a few settings, like net.ipv4.tcp_tw_recycle
, where there is good general guidance: It's almost never a good idea to set net.ipv4.tcp_tw_recycle=1
.
If behind a NAT device or clients use per-connection timestamp randomization (default in Linux 4.10+ and a few other OSes), you're likely to have problems. The net.ipv4.tcp_tw_recycle
option was removed in Linux 4.12.
Unfortunately, a lot of the kernel parameters are not well documented. So fully understanding the implications can be a non-trivial task. If you don't have time for research and experimentation, using tuned
is your next best bet.
@brandt, TIL about tuned, thanks!
comments here are helpless but confusing people.
You mean unhelpful. I think, taken all together, the comments send a pretty reasonable message: just do not copy this configuration, it does not make sense. Sapienti sat.
Never change kernel parameters on a production system without understanding the consequences and having tested it first.
Better not to blindly copy paste this config. It's more advisable to override this configs when stumped upon a related error.
net.ipv4.tcp_rfc1337 = 1
Actually complies with rfc1337, and does "not" provide the protection, needs to be set to 0. Info is in the source code.
net.ipv4.tcp_tw_recycle =1 is dangerous will break connectivity to NAT users, this sysctl has now been removed in new kernels.
Also the default rmem and wmem are set way too high, a recipe for resource exhaustion.
net.ipv4.tcp_rfc1337 = 1
Actually complies with rfc1337, and does "not" provide the protection, needs to be set to 0. Info is in the source code.
No. By my reading of the source code, tcp_rfc1337 = 1 does protect against time-wait assassinations. See https://serverfault.com/questions/787624/why-isnt-net-ipv4-tcp-rfc1337-enabled-by-default#comment1380898_789212
Hi I have couple of questions regarding buffer tunnings.
-
Why there are different values for TCP and UDP for buffers? net.ipv4.tcp_mem = 786432 1048576 26777216
net.ipv4.udp_mem = 65536 131072 262144 -
net.ipv4.tcp_wmem = 8192 65536 33554432 seams double value for most of the recommodations. How do you end up with those values? What is MTU value that your are considering?
-
Did you consider any offloading cappacity for the NIC like segmentation offloading and e.t.c.
these a typo here, 26777216 should be 16777216 , 16777216 is 1<<24, or exactly 16MiB
Better not to blindly copy paste this config.
absolutely!
if you search the internet for 26777216, there's a lot of this typo almost all in sysctl.conf
. I'm wondering where's the source of this.
To everyone directly copying this config file, the option net.ipv4.tcp_tw_recycle = 1 will very likely cause major issues and strange network behavior in virtualized environments / load balancers / firewalls where the observed behavior will be random stalls up to 15 seconds and during a packet capture packets will simply "disappear"
Refer to https://www.speedguide.net/articles/linux-tweaking-121 to see if this setting is recommended/needed for you as there is no clear performance benefit when using this setting.