Skip to content

Instantly share code, notes, and snippets.

@kjmkznr
Created November 13, 2014 12:33
Show Gist options
  • Save kjmkznr/62a2aa0e98fe6f2866ac to your computer and use it in GitHub Desktop.
Save kjmkznr/62a2aa0e98fe6f2866ac to your computer and use it in GitHub Desktop.

Nginx vs h2o (http/1.1)

Environment

WebServer

  • AWS EC2 c3.xlarge(ap-northeast-1a)
  • cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
stepping        : 4
microcode       : 0x415
cpu MHz         : 2800.072
cache size      : 25600 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm xsaveopt fsgsbase smep erms
bogomips        : 5600.14
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
stepping        : 4
microcode       : 0x415
cpu MHz         : 2800.072
cache size      : 25600 KB
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 2
apicid          : 2
initial apicid  : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm xsaveopt fsgsbase smep erms
bogomips        : 5600.14
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
stepping        : 4
microcode       : 0x415
cpu MHz         : 2800.072
cache size      : 25600 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 1
initial apicid  : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm xsaveopt fsgsbase smep erms
bogomips        : 5600.14
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
stepping        : 4
microcode       : 0x415
cpu MHz         : 2800.072
cache size      : 25600 KB
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 2
apicid          : 3
initial apicid  : 3
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm xsaveopt fsgsbase smep erms
bogomips        : 5600.14
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

LoadWorker

  • AWS EC2 c3.xlarge (ap-northeast-1a)
  • cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
stepping        : 4
microcode       : 0x415
cpu MHz         : 2800.072
cache size      : 25600 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm xsaveopt fsgsbase smep erms
bogomips        : 5600.14
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
stepping        : 4
microcode       : 0x415
cpu MHz         : 2800.072
cache size      : 25600 KB
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 2
apicid          : 2
initial apicid  : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm xsaveopt fsgsbase smep erms
bogomips        : 5600.14
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
stepping        : 4
microcode       : 0x415
cpu MHz         : 2800.072
cache size      : 25600 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 1
initial apicid  : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm xsaveopt fsgsbase smep erms
bogomips        : 5600.14
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
stepping        : 4
microcode       : 0x415
cpu MHz         : 2800.072
cache size      : 25600 KB
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 2
apicid          : 3
initial apicid  : 3
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm xsaveopt fsgsbase smep erms
bogomips        : 5600.14
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

nginx

$ /usr/sbin/nginx -V
nginx version: nginx/1.7.6
TLS SNI support enabled
configure arguments: --prefix=/usr --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error_log --pid-path=/run/nginx.pid --lock-path=/run/lock/nginx.lock --with-cc-opt=-I/usr/include --with-ld-opt=-L/usr/lib64 --http-log-path=/var/log/nginx/access_log --http-client-body-temp-path=//var/lib/nginx/tmp/client --http-proxy-temp-path=//var/lib/nginx/tmp/proxy --http-fastcgi-temp-path=//var/lib/nginx/tmp/fastcgi --http-scgi-temp-path=//var/lib/nginx/tmp/scgi --http-uwsgi-temp-path=//var/lib/nginx/tmp/uwsgi --with-ipv6 --with-pcre --without-http_access_module --without-http_auth_basic_module --without-http_autoindex_module --without-http_browser_module --without-http_charset_module --without-http_empty_gif_module --without-http_fastcgi_module --without-http_geo_module --without-http_gzip_module --without-http_limit_req_module --without-http_limit_conn_module --without-http_map_module --without-http_memcached_module --without-http_proxy_module --without-http_referer_module --without-http_scgi_module --without-http_ssi_module --without-http_split_clients_module --without-http_upstream_ip_hash_module --without-http_userid_module --without-http_uwsgi_module --with-http_perl_module --add-module=external_module/headers-more-nginx-module-0.25 --with-http_ssl_module --without-mail_imap_module --without-mail_pop3_module --without-mail_smtp_module --user=nginx --group=nginx

h2o

06cfffe9b104f2e6242a1ebe16d369b87e7aa14d

lwan

5cc85b98b6fe6f8b5767c94262ed958c704577d8

nginx.conf

user nginx nginx;
worker_processes 4;

error_log /var/log/nginx/error_log info;

events {
        worker_connections 20480;
        multi_accept on;
        use epoll;
}

http {
        default_type  application/octet-stream;

        access_log  off;

        sendfile        on;
        tcp_nopush      on;
        tcp_nodelay     on;
        server_tokens   off;
        index   index.html;

        client_body_buffer_size 10K;
        client_header_buffer_size 1k;
        client_max_body_size 8m;
        large_client_header_buffers 2 1k;

        client_body_timeout 12;
        client_header_timeout 12;

        keepalive_timeout 65;
        keepalive_requests 1000000;
        send_timeout 10;

        open_file_cache max=100000 inactive=20s;
        open_file_cache_valid 60s;
        open_file_cache_min_uses 2;
        open_file_cache_errors on;

        server {
                listen       80;
                server_name  localhost;
                root         /home/ec2-user/h2o/examples/doc_root;
        }
}

h2o.conf

# to find out the configuration commands, run: h2o --help
max-connections: 2048
num-threads: 4
listen: 80
hosts:
  default:
    paths:
      /:
        file.dir: /home/ec2-user/h2o/examples/doc_root

lwan.conf

keep_alive_timeout = 15
quiet = true
reuse_port = false
expires = 1M 1w

listener *:80 {
    serve_files / {
            path = /home/ec2-user/h2o/examples/doc_root
    }
}

Result

nginx

$ ./wrk -t 4 http://10.166.140.91/1
Running 10s test @ http://10.166.140.91/1
  4 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   198.05us   50.58us   1.37ms   82.86%
    Req/Sec    10.31k     0.97k   12.22k    73.49%
  392143 requests in 10.00s, 90.88MB read
Requests/sec:  39216.84
Transfer/sec:      9.09MB
$ ./wrk -t 4 -c 20 http://10.166.140.91/1
Running 10s test @ http://10.166.140.91/1
  4 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   353.20us   95.61us   3.31ms   83.82%
    Req/Sec    14.81k     1.78k   18.78k    74.60%
  561377 requests in 10.00s, 130.10MB read
Requests/sec:  56141.62
Transfer/sec:     13.01MB
$ ./wrk -t 4 -c 50 http://10.166.140.91/1
Running 10s test @ http://10.166.140.91/1
  4 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   802.13us  592.02us  19.31ms   98.03%
    Req/Sec    16.32k     2.36k   21.11k    78.04%
  617477 requests in 10.00s, 143.10MB read
Requests/sec:  61750.84
Transfer/sec:     14.31MB
$ ./wrk -t 4 -c 100 http://10.166.140.91/1
Running 10s test @ http://10.166.140.91/1
  4 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.55ms  419.76us  25.05ms   95.00%
    Req/Sec    17.15k     1.91k   21.22k    76.38%
  649171 requests in 10.00s, 150.44MB read
Requests/sec:  64921.53
Transfer/sec:     15.05MB
$ ./wrk -t 4 -c 200 http://10.166.140.91/1
Running 10s test @ http://10.166.140.91/1
  4 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.18ms  567.92us   7.73ms   90.00%
    Req/Sec    16.82k     2.11k   25.11k    79.21%
  637468 requests in 10.00s, 147.73MB read
Requests/sec:  63751.75
Transfer/sec:     14.77MB
$ ./wrk -t 4 -c 500 http://10.166.140.91/1
Running 10s test @ http://10.166.140.91/1
  4 threads and 500 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     7.98ms  682.77us  14.61ms   80.78%
    Req/Sec    16.17k     1.23k   19.50k    72.19%
  627540 requests in 10.00s, 145.43MB read
Requests/sec:  62759.08
Transfer/sec:     14.54MB

strace

$ sudo strace -c -p 1175 -f
Process 1175 attached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 45.59    0.002126           0    101018           setsockopt
 22.73    0.001060           0     50509           writev
 18.72    0.000873           0     50509           sendfile
 12.95    0.000604           0     50610        66 recvfrom
  0.00    0.000000           0       104           write
  0.00    0.000000           0         2           open
  0.00    0.000000           0       103           close
  0.00    0.000000           0         2           fstat
  0.00    0.000000           0         2         1 recvmsg
  0.00    0.000000           0        10           gettimeofday
  0.00    0.000000           0         9           epoll_wait
  0.00    0.000000           0       102           epoll_ctl
  0.00    0.000000           0       103         2 accept4
------ ----------- ----------- --------- --------- ----------------
100.00    0.004663                253083        69 total

h2o

$ ./wrk -t 4 http://10.166.140.91/1
Running 10s test @ http://10.166.140.91/1
  4 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   211.29us  523.05us  19.19ms   99.82%
    Req/Sec    10.43k     1.12k   12.56k    78.82%
  394688 requests in 10.00s, 83.94MB read
Requests/sec:  39471.50
Transfer/sec:      8.39MB
$ ./wrk -t 4 -c 20 http://10.166.140.91/1
Running 10s test @ http://10.166.140.91/1
  4 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   308.98us   82.53us   1.78ms   79.50%
    Req/Sec    16.77k     1.77k   21.22k    70.47%
  634874 requests in 10.00s, 135.02MB read
Requests/sec:  63492.33
Transfer/sec:     13.50MB
$ ./wrk -t 4 -c 50 http://10.166.140.91/1
Running 10s test @ http://10.166.140.91/1
  4 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   684.79us  158.48us   2.63ms   78.17%
    Req/Sec    18.46k     2.24k   24.33k    72.08%
  698770 requests in 10.00s, 148.61MB read
Requests/sec:  69881.55
Transfer/sec:     14.86MB
$ ./wrk -t 4 -c 100 http://10.166.140.91/1
Running 10s test @ http://10.166.140.91/1
  4 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.39ms  256.41us   4.15ms   70.68%
    Req/Sec    18.95k     2.06k   25.22k    71.10%
  718275 requests in 10.00s, 152.76MB read
Requests/sec:  71832.74
Transfer/sec:     15.28MB
$ ./wrk -t 4 -c 200 http://10.166.140.91/1
Running 10s test @ http://10.166.140.91/1
  4 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.03ms    0.90ms  29.96ms   93.04%
    Req/Sec    17.75k     2.35k   27.67k    75.86%
  672953 requests in 10.00s, 143.12MB read
Requests/sec:  67300.67
Transfer/sec:     14.31MB
$ ./wrk -t 4 -c 500 http://10.166.140.91/1
Running 10s test @ http://10.166.140.91/1
  4 threads and 500 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     7.09ms    1.28ms 198.29ms   89.64%
    Req/Sec    18.24k     2.13k   30.11k    79.29%
  687398 requests in 10.00s, 146.19MB read
Requests/sec:  68743.92
Transfer/sec:     14.62MB

strace

$ sudo strace -c -p 1064 -f 
Process 1064 attached with 5 threads
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 90.76   13.110000      285000        46        10 futex
  3.98    0.575471           2    302840    100879 read
  1.35    0.195361           2    101015           close
  1.10    0.159450           2    100914           open
  0.97    0.140713           1    100912           writev
  0.96    0.138034           1    100914           fstat
  0.71    0.103127          10     10418           gettimeofday
  0.12    0.017703           5      3479           select
  0.03    0.005026          44       113        12 accept4
  0.00    0.000042           0       101           setsockopt
  0.00    0.000012           0       117           mprotect
  0.00    0.000000           0         1           lseek
  0.00    0.000000           0         1           mmap
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0        46           madvise
  0.00    0.000000           0       102           fcntl
------ ----------- ----------- --------- --------- ----------------
100.00   14.444939                721020    100901 total

lwan

$ ./wrk -t 4 -c 10 http://10.166.140.91/1
Running 10s test @ http://10.166.140.91/1
  4 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   192.99us   74.07us   4.52ms   94.72%
    Req/Sec    10.54k     0.96k   12.67k    74.55%
  399242 requests in 10.00s, 91.38MB read
Requests/sec:  39926.74
Transfer/sec:      9.14MB
$ ./wrk -t 4 -c 20 http://10.166.140.91/1
Running 10s test @ http://10.166.140.91/1
  4 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   300.36us   73.51us   3.11ms   79.42%
    Req/Sec    17.19k     1.75k   21.56k    72.73%
  651566 requests in 10.00s, 149.13MB read
Requests/sec:  65161.52
Transfer/sec:     14.91MB
$ ./wrk -t 4 -c 50 http://10.166.140.91/1
Running 10s test @ http://10.166.140.91/1
  4 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   717.89us  201.66us   3.75ms   83.73%
    Req/Sec    17.80k     2.54k   23.67k    72.26%
  674859 requests in 10.00s, 154.46MB read
Requests/sec:  67490.19
Transfer/sec:     15.45MB
$ ./wrk -t 4 -c 100 http://10.166.140.91/1
Running 10s test @ http://10.166.140.91/1
  4 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.43ms  274.62us   6.36ms   76.70%
    Req/Sec    18.58k     1.88k   22.89k    74.97%
  701859 requests in 10.00s, 160.64MB read
Requests/sec:  70191.07
Transfer/sec:     16.07MB
$ ./wrk -t 4 -c 200 http://10.166.140.91/1
Running 10s test @ http://10.166.140.91/1
  4 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.96ms  422.10us   6.42ms   82.15%
    Req/Sec    17.98k     1.84k   22.00k    74.87%
  679007 requests in 10.00s, 155.41MB read
Requests/sec:  67905.54
Transfer/sec:     15.54MB
$ ./wrk -t 4 -c 500 http://10.166.140.91/1
Running 10s test @ http://10.166.140.91/1
  4 threads and 500 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     7.48ms    0.93ms  22.33ms   90.06%
    Req/Sec    17.25k     1.55k   26.10k    83.12%
  665880 requests in 10.00s, 152.41MB read
Requests/sec:  66593.39
Transfer/sec:     15.24MB

strace

$ sudo strace -c -p 32596 -f                                                                                                                                          
Process 32596 attached with 6 threads
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 52.69   12.271572      119141       103         2 accept4
 33.79    7.870502     2623501         3           nanosleep
 12.67    2.950000     2950000         1           restart_syscall
  0.29    0.067984           0    189040     37074 futex
  0.14    0.033608           0    121938           read
  0.13    0.030468           0    121834           getpeername
  0.13    0.030045           6      5298           epoll_wait
  0.12    0.027697           0    121834           writev
  0.02    0.005405           0    121842           gettid
  0.02    0.003875           1      4237           write
  0.00    0.000027           0       123           close
  0.00    0.000000           0         2           open
  0.00    0.000000           0         2           fstat
  0.00    0.000000           0         1           lseek
  0.00    0.000000           0         7           mmap
  0.00    0.000000           0       102           mprotect
  0.00    0.000000           0         8           munmap
  0.00    0.000000           0         1           rt_sigreturn
  0.00    0.000000           0        26           madvise
  0.00    0.000000           0         1           fcntl
  0.00    0.000000           0       101           epoll_ctl
  0.00    0.000000           0         6           openat
  0.00    0.000000           0         6           newfstatat
------ ----------- ----------- --------- --------- ----------------
100.00   23.291183                686516     37076 total
@lpereira
Copy link

If you build Lwan in release mode (cmake -DCMAKE_BUILD_TYPE=Release), no syscalls will be restarted and there won't be any futexes as well. It should be ~12% faster, judging only by the strace -c output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment