|### KERNEL TUNING ###|
|# Increase size of file handles and inode cache|
|fs.file-max = 2097152|
|# Do less swapping|
|vm.swappiness = 10|
|vm.dirty_ratio = 60|
|vm.dirty_background_ratio = 2|
|# Sets the time before the kernel considers migrating a proccess to another core|
|kernel.sched_migration_cost_ns = 5000000|
|# Group tasks by TTY|
|#kernel.sched_autogroup_enabled = 0|
|### GENERAL NETWORK SECURITY OPTIONS ###|
|# Number of times SYNACKs for passive TCP connection.|
|net.ipv4.tcp_synack_retries = 2|
|# Allowed local port range|
|net.ipv4.ip_local_port_range = 2000 65535|
|# Protect Against TCP Time-Wait|
|net.ipv4.tcp_rfc1337 = 1|
|# Control Syncookies|
|net.ipv4.tcp_syncookies = 1|
|# Decrease the time default value for tcp_fin_timeout connection|
|net.ipv4.tcp_fin_timeout = 15|
|# Decrease the time default value for connections to keep alive|
|net.ipv4.tcp_keepalive_time = 300|
|net.ipv4.tcp_keepalive_probes = 5|
|net.ipv4.tcp_keepalive_intvl = 15|
|### TUNING NETWORK PERFORMANCE ###|
|# Default Socket Receive Buffer|
|net.core.rmem_default = 31457280|
|# Maximum Socket Receive Buffer|
|net.core.rmem_max = 33554432|
|# Default Socket Send Buffer|
|net.core.wmem_default = 31457280|
|# Maximum Socket Send Buffer|
|net.core.wmem_max = 33554432|
|# Increase number of incoming connections|
|net.core.somaxconn = 65535|
|# Increase number of incoming connections backlog|
|net.core.netdev_max_backlog = 65536|
|# Increase the maximum amount of option memory buffers|
|net.core.optmem_max = 25165824|
|# Increase the maximum total buffer-space allocatable|
|# This is measured in units of pages (4096 bytes)|
|net.ipv4.tcp_mem = 786432 1048576 26777216|
|net.ipv4.udp_mem = 65536 131072 262144|
|# Increase the read-buffer space allocatable|
|net.ipv4.tcp_rmem = 8192 87380 33554432|
|net.ipv4.udp_rmem_min = 16384|
|# Increase the write-buffer-space allocatable|
|net.ipv4.tcp_wmem = 8192 65536 33554432|
|net.ipv4.udp_wmem_min = 16384|
|# Increase the tcp-time-wait buckets pool size to prevent simple DOS attacks|
|net.ipv4.tcp_max_tw_buckets = 1440000|
|net.ipv4.tcp_tw_recycle = 1|
|net.ipv4.tcp_tw_reuse = 1|
To everyone directly copying this config file, the option net.ipv4.tcp_tw_recycle = 1 will very likely cause major issues and strange network behavior in virtualized environments / load balancers / firewalls where the observed behavior will be random stalls up to 15 seconds and during a packet capture packets will simply "disappear"
Refer to https://www.speedguide.net/articles/linux-tweaking-121 to see if this setting is recommended/needed for you as there is no clear performance benefit when using this setting.
Ugh. Try that with a fast Ethernet card, and see what happens on the receiving end... I think it will kill the receiver's ring buffer at 1 Gbps already. I would not go over 128K on the sending side of a fast link, not at 10G for sure.
So, the machine handles 64K clients, and is buffering up to 32MB for each on its output. 2 TB of write buffers. That's a lot of RAM and cores to support this, and a very, very large stack of 50Gbps cards. What is the point of handling this huge workload in a monstrous machine? Probably doable with a non-trivial PCIe topology, and throw in OmniPath interconnects, bit it still seems like a couple of 42U racks of smaller servers could do the same job with a much better redundancy at a lower cost, given that even 50GbE's are uncommon and expensive, and you do not build non-uniform PCIe topologies in your garage. Without an explanation why, this looks... well, quite an unusual configuration.
tw_recycle is useful in virtualized environments if combination with its program that designed for reuse.
The fact is, tweaking kernel parameters has nuanced effects on your system. Everything has a cost-benefit tradeoff.
Copy-pasting someone else's sysctl settings without understanding the implications will often make one's performance worse -- even if it worked well for the person who posted the config. There isn't a good one-sized-fits-all "performance" config.
All that said, there are a few settings, like
If behind a NAT device or clients use per-connection timestamp randomization (default in Linux 4.10+ and a few other OSes), you're likely to have problems. The
Unfortunately, a lot of the kernel parameters are not well documented. So fully understanding the implications can be a non-trivial task. If you don't have time for research and experimentation, using
net.ipv4.tcp_rfc1337 = 1
Actually complies with rfc1337, and does "not" provide the protection, needs to be set to 0. Info is in the source code.
net.ipv4.tcp_tw_recycle =1 is dangerous will break connectivity to NAT users, this sysctl has now been removed in new kernels.
Also the default rmem and wmem are set way too high, a recipe for resource exhaustion.
No. By my reading of the source code, tcp_rfc1337 = 1 does protect against time-wait assassinations. See https://serverfault.com/questions/787624/why-isnt-net-ipv4-tcp-rfc1337-enabled-by-default#comment1380898_789212