Skip to content

Instantly share code, notes, and snippets.

@teocci
Last active November 23, 2020 02:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save teocci/a24d1a1c7cc431248761ebd680dbf285 to your computer and use it in GitHub Desktop.
Save teocci/a24d1a1c7cc431248761ebd680dbf285 to your computer and use it in GitHub Desktop.

Tuning Nginx for Best Performance

This article is part 2 of a series about building a high-performance web cluster powerful enough to handle 3 million requests per second. For this part of the project, you can use any web server you like. I decided to use Nginx, because it’s lightweight, reliable, and fast.

Generally, a properly tuned Nginx server on Linux can handle 500,000 – 600,000 requests per second. My Nginx servers consistently handle 904k req/sec, and have sustained high loads like these for the ~12 hours that I tested them.

It’s important to know that everything listed here was used in a testing environment, and that you might actually want very different settings for your production servers.

Install the Nginx package from the EPEL repository.

yum -y install nginx

Back up the original config, and start hacking away at a config of your own.

cp /etc/nginx/nginx.conf /etc/nginx/nginx.conf.orig
vim /etc/nginx/nginx.conf
# This number should be, at maximum, the number of CPU cores on your system.
# (since nginx doesn't benefit from more than one worker per CPU.)
worker_processes 24;

# Number of file descriptors used for Nginx. This is set in the OS with 'ulimit -n 200000'
# or using /etc/security/limits.conf
worker_rlimit_nofile 200000;

# only log critical errors
error_log /var/log/nginx/error.log crit

# Determines how many clients will be served by each worker process.
# (Max clients = worker_connections * worker_processes)
# "Max clients" is also limited by the number of socket connections available on the system (~64k)
worker_connections 4000;

# essential for linux, optmized to serve many clients with each thread
use epoll;

# Accept as many connections as possible, after nginx gets notification about a new connection.
# May flood worker_connections, if that option is set too low.
multi_accept on;

# Caches information about open FDs, freqently accessed files.
# Changing this setting, in my environment, brought performance up from 560k req/sec, to 904k req/sec.
# I recommend using some varient of these options, though not the specific values listed below.
open_file_cache max=200000 inactive=20s;
open_file_cache_valid 30s;
open_file_cache_min_uses 2;
open_file_cache_errors on;

# Buffer log writes to speed up IO, or disable them altogether
#access_log /var/log/nginx/access.log main buffer=16k;
access_log off;

# Sendfile copies data between one FD and other from within the kernel.
# More efficient than read() + write(), since the requires transferring data to and from the user space.
sendfile on;

# Tcp_nopush causes nginx to attempt to send its HTTP response head in one packet,
# instead of using partial frames. This is useful for prepending headers before calling sendfile,
# or for throughput optimization.
tcp_nopush on;

# don't buffer data-sends (disable Nagle algorithm). Good for sending frequent small bursts of data in real time.
tcp_nodelay on;

# Timeout for keep-alive connections. Server will close connections after this time.
keepalive_timeout 30;

# Number of requests a client can make over the keep-alive connection. This is set high for testing.
keepalive_requests 100000;

# allow the server to close the connection after a client stops responding. Frees up socket-associated memory.
reset_timedout_connection on;

# send the client a "request timed out" if the body is not loaded by this time. Default 60.
client_body_timeout 10;

# If the client stops reading data, free up the stale client connection after this much time. Default 60.
send_timeout 2;

# Compression. Reduces the amount of data that needs to be transferred over the network
gzip on;
gzip_min_length 10240;
gzip_proxied expired no-cache no-store private auth;
gzip_types text/plain text/css text/xml text/javascript application/x-javascript application/xml;
gzip_disable "MSIE [1-6]\.";

Start up Nginx and set it to start on boot.

service nginx start
chkconfig nginx on

Now point Tsung at this server and let it go. It’ll run for ~10 minutes before it hits the server’s peak capabilities, depending on your Tsung config.

vim ~/.tsung/tsung.xml
      <server host="YOURWEBSERVER" port="80" type="tcp"/>
tsung start

Hit ctrl+C after you’re satisfied with the test results, otherwise it’ll run for hours. Use the alias “treport” that we set up earlier to view the results.

Web server tuning, part 2: TCP stack tuning This section applies to any web server, not just Nginx.Tuning the kernel’s TCP settings will help you make the most of your bandwidth. These settings worked best for me on a 10-Gbase-T network. My network’s performance went from ~8Gbps with the default system settings, to 9.3Gbps, using these tuned settings. As always, your mileage may vary.

When tuning these options, I recommend changing just one at a time. Then run a network benchmark tool like ‘netperf’, ‘iperf’, or something like my script, cluster-netbench.pl, to test more than one pair of nodes at a time.

yum -y install netperf iperf
vim /etc/sysctl.conf

Sysctl file:

# Increase system IP port limits to allow for more connections

net.ipv4.ip_local_port_range = 2000 65000

net.ipv4.tcp_window_scaling = 1

# number of packets to keep in backlog before the kernel starts dropping them
net.ipv4.tcp_max_syn_backlog = 3240000

# increase socket listen backlog
net.core.somaxconn = 3240000
net.ipv4.tcp_max_tw_buckets = 1440000

# Increase TCP buffer sizes
net.core.rmem_default = 8388608
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_congestion_control = cubic

Apply the new settings in between each change.

sysctl -p /etc/sysctl.conf

Don’t forget to run your network benchmark program between each change! It’s important to keep track of what settings work for you. You’ll save yourself a lot of time by being methodical in your testing.

Part 3: Building a Load-Balancing Cluster with LVS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment