Table of Contents
- Steps to reproduce
- Observed results
- Remarks on current DNS client implementation
- Proposed changes
- Debugging information
- Attachments
- Compile Haproxy 1.6-dev3 (e.g.
make TARGET=linux2628 USE_PCRE=1 USE_OPENSSL=1 USE_ZLIB=1
) - Run
./haproxy -f ./resolver-test.conf -db
using the file here: https://gist.github.com/janeczku/e956448b6fb67c63a8cc - Run
top
and monitor CPU usage for at least a minute
Upon running Haproxy 1.6 with a resolvers
section and including a backend server whose DNS record is larger than 512 bytes (e.g. server ibm.com ibm.com:80 check resolvers mydns
) the following can be observed:
-
The Haproxy process will start to drain CPU ressources around 30 seconds after starting up. See #1
-
Haproxy will stop to respond to frontend requests (either immediately or after a timeperiod x)
-
In rare cases Haproxy will additionally lock itself in a feedback loop with the upstream DNS server, sending a massive flood of identical queries for a specific hostname. This packet storm is amplificated in the receiving direction by the nature of DNS (ANY) replies being magnitude x larger than the queries. This will eventually starve bandwidth and/or system ressources. See #2
-
In some instances Haproxy may send all its DNS queries triplicated (same package sent repeatedly 1ms apart). See #3
-
The DNS client is interpreting a high number of DNS replies as invalid (according to "show stat resolvers"). But a Wireshark traffic analysis did not reveal invalid replies. See #4
-
Haproxy sometimes does not fall back to a
QTPYE A
/QTPYE AAAA
query when the reply to the first query contains no records. The result is that the hostname will never be successfully resolved. See #5 -
When partial records are included in a truncated
QTYPE ANY
reply then Haproxy will use that incomplete record instead of falling back toQTYPE A
orQTYPE AAAA
query. This may lead to the problem where Haproxy incorrectly assumes that the IP address of a server has changed (see below).
I think that not acting on the truncated (TC) flag in a DNS reply and even using partial records found in such a reply is bad practice. RFC2181 stipulates to "completely discard truncated DNS replies" for good reasons: Not discarding truncated replies or at least handling them with care is expected to cause erratic results depending solely on what the upstream resolver includes in a truncated reply's payload.
In this case, dns_validate_dns_response()
will accept the reply as valid and Haproxy will use a potentially incomplete set of address records.
Bad. While Haproxy will succeed to connect to the server, "server IP change" events may be triggered falsely in every resolution cycle (30 seconds by default):
Let's say a hostname's DNS record has two A records. A truncated reply may only contain one of them. Due to round-robin mechanism, which of the A record is included in the truncated reply may change from query to query. Thus on every other resolution cycle
dns_get_ip_from_response()
will incorrectly assume that the server has changed it's IP and act accordingly.
In this case, dns_validate_dns_response()
will return a DNS_RESP_INVALID
error. This will cause the error callback to stop trying to resolve the hostname and just queue it for the next resolve cycle.
Bad. The hostname will NEVER be resolved.
In this case, dns_validate_dns_response()
will return a DNS_RESP_ANCOUNT_ZERO
error. This should cause the error handler to fall back to a QTYPE A
or QTYPE AAAA
query.
Good. The hostname will be successfully resolved.
TL;DR
Ignoring the truncated TC flag produces erratic results including failure to resolve some hostnames.
- Conform with RFC2181: When a DNS reply has the TC truncated flag set, discard that reply and fall back to another QTYPE or to a TCP query.
- Don't default to
QTYPE ANY
queries. Parsing the potentially very large ANY response payload introduces unnecessary overhead. Additionally since we are not interested in MX,SRV or TXT records - why bother with the entire RRset? - DNS client should honor the
resolve-prefer
setting from the get-go: When it is set toipv4
then the first DNS query should be aQTYPE A
and the fall back aQTYPE AAAA
and vice-versa.
Debian 7 64-bit
Linux gw-pl-1 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt11-1~bpo70+1 (2015-06-08) x86_64 GNU/Linux
HA-Proxy version 1.6-dev3-50bdda6 2015/07/22
Copyright 2000-2015 Willy Tarreau <willy@haproxy.org>
Build options :
TARGET = linux2628
CPU = generic
CC = gcc
CFLAGS = -pg -O2 -pg -fno-strict-aliasing -Wdeclaration-after-statement
OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1
Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200
Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.7
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with OpenSSL version : OpenSSL 1.0.1e 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.30 2012-02-04
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built without Lua support
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
$ gprof haproxy ./gmon.out
Flat profile (Top 35)
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
17.66 12.32 12.32 757253132 0.00 0.00 wake_expired_tasks
14.23 22.25 9.93 757253135 0.00 0.00 tv_update_date
10.92 29.87 7.62 757253131 0.00 0.00 _do_poll
9.42 36.44 6.57 757253132 0.00 0.00 process_runnable_tasks
6.62 41.06 4.62 1514546474 0.00 0.00 eb32_insert
5.93 45.20 4.14 1514502241 0.00 0.00 eb32_lookup_ge
5.81 49.25 4.05 1514548677 0.00 0.00 eb_delete
5.23 52.90 3.65 757245535 0.00 0.00 dns_process_resolve
4.59 56.10 3.20 757271119 0.00 0.00 __task_queue
3.63 58.63 2.53 757271011 0.00 0.00 __task_wakeup
3.02 60.74 2.11 run_poll_loop
2.65 62.59 1.85 757253131 0.00 0.00 fd_process_cached_events
2.29 64.19 1.60 338 0.00 0.00 eb_insert_dup
1.86 65.49 1.30 757253131 0.00 0.00 applet_run_active
1.22 66.34 0.85 12737 0.00 0.00 __fd_clo
1.18 67.16 0.82 eb32_lookup_le
1.06 67.90 0.74 9410 0.00 0.00 dns_send_query
1.03 68.62 0.72 shared_context_set_cache
0.67 69.09 0.47 tcp_exec_req_rules
0.42 69.38 0.30 _tv_isgt
0.30 69.59 0.21 12737 0.00 0.00 fd_delete
0.16 69.70 0.11 strm_log
0.03 69.72 0.02 chk_report_conn_err
0.01 69.73 0.01 25467 0.00 0.00 process_chk
0.01 69.74 0.01 12734 0.00 0.00 tcp_connect_server
0.01 69.75 0.01 610 0.00 0.00 pat_ref_newid
0.01 69.76 0.01 1 0.01 0.01 pattern_finalize_config
0.00 69.76 0.00 26369 0.00 0.00 srv_getinter
0.00 69.76 0.00 25467 0.00 0.00 conn_fd_handler
0.00 69.76 0.00 25467 0.00 0.00 fd_update_cache
0.00 69.76 0.00 25467 0.00 0.00 set_server_check_status
0.00 69.76 0.00 25467 0.00 0.00 tcp_connect_probe
0.00 69.76 0.00 21955 0.00 0.00 eb32_lookup
0.00 69.76 0.00 17479 0.00 0.00 dgram_fd_handler
0.00 69.76 0.00 17479 0.00 0.00 dns_resolve_recv
Call graph (Top 15)
granularity: each sample hit covers 2 byte(s) for 0.01% of 69.76 seconds
index % time self children called name
<spontaneous>
[1] 94.5 2.11 63.79 run_poll_loop [1]
6.57 15.27 757253132/757253132 process_runnable_tasks [2]
12.32 8.94 757253132/757253132 wake_expired_tasks [3]
7.62 9.93 757253131/757253131 _do_poll [4]
1.85 0.00 757253131/757253131 fd_process_cached_events [12]
1.30 0.00 757253131/757253131 applet_run_active [18]
0.00 0.00 1/1 __signal_process_queue [59]
0.00 0.00 1/757253135 tv_update_date [5]
-----------------------------------------------
6.57 15.27 757253132/757253132 run_poll_loop [1]
[2] 31.3 6.57 15.27 757253132 process_runnable_tasks [2]
3.20 2.50 757258269/757271119 __task_queue [6]
3.65 0.45 757245535/757245535 dns_process_resolve [10]
2.07 0.00 757248797/1514502241 eb32_lookup_ge [9]
2.03 0.00 757271010/1514548677 eb_delete [11]
0.01 1.36 25467/25467 process_chk [17]
0.00 0.00 8/8 manage_proxy [45]
-----------------------------------------------
12.32 8.94 757253132/757253132 run_poll_loop [1]
[3] 30.5 12.32 8.94 757253132 wake_expired_tasks [3]
2.53 2.31 757258270/757271011 __task_wakeup [7]
2.07 0.00 757253132/1514502241 eb32_lookup_ge [9]
2.03 0.00 757270818/1514548677 eb_delete [11]
0.00 0.00 12548/757271119 __task_queue [6]
-----------------------------------------------
7.62 9.93 757253131/757253131 run_poll_loop [1]
[4] 25.2 7.62 9.93 757253131 _do_poll [4]
9.93 0.00 757253131/757253135 tv_update_date [5]
-----------------------------------------------
0.00 0.00 1/757253135 init [14]
0.00 0.00 1/757253135 run_poll_loop [1]
0.00 0.00 1/757253135 check_config_validity [15]
0.00 0.00 1/757253135 soft_stop [58]
9.93 0.00 757253131/757253135 _do_poll [4]
[5] 14.2 9.93 0.00 757253135 tv_update_date [5]
-----------------------------------------------
0.00 0.00 1/757271119 trigger_resolution [26]
0.00 0.00 301/757271119 start_check_task [54]
0.00 0.00 12548/757271119 wake_expired_tasks [3]
3.20 2.50 757258269/757271119 process_runnable_tasks [2]
[6] 8.2 3.20 2.50 757271119 __task_queue [6]
2.31 0.00 757271111/1514546474 eb32_insert [8]
0.19 0.00 41/338 eb_insert_dup [13]
-----------------------------------------------
0.00 0.00 8/757271011 soft_stop [58]
0.00 0.00 12733/757271011 wake_srv_chk [50]
2.53 2.31 757258270/757271011 wake_expired_tasks [3]
[7] 6.9 2.53 2.31 757271011 __task_wakeup [7]
2.31 0.00 757271011/1514546474 eb32_insert [8]
-----------------------------------------------
0.00 0.00 1/1514546474 bind_parse_level [61]
0.00 0.00 301/1514546474 fwrr_init_server_groups [16]
0.00 0.00 312/1514546474 check_config_validity [15]
0.00 0.00 3738/1514546474 trigger_resolution [26]
2.31 0.00 757271011/1514546474 __task_wakeup [7]
2.31 0.00 757271111/1514546474 __task_queue [6]
[8] 6.6 4.62 0.00 1514546474 eb32_insert [8]
-----------------------------------------------
0.00 0.00 312/1514502241 get_next_id [56]
2.07 0.00 757248797/1514502241 process_runnable_tasks [2]
2.07 0.00 757253132/1514502241 wake_expired_tasks [3]
[9] 5.9 4.14 0.00 1514502241 eb32_lookup_ge [9]
-----------------------------------------------
3.65 0.45 757245535/757245535 process_runnable_tasks [2]
[10] 5.9 3.65 0.45 757245535 dns_process_resolve [10]
0.44 0.00 5622/9410 dns_send_query [22]
0.00 0.00 2810/3206 snr_resolution_error_cb [38]
0.00 0.00 2810/6548 dns_reset_resolution [52]
-----------------------------------------------
0.00 0.00 301/1514548677 deinit [44]
0.00 0.00 6548/1514548677 dns_reset_resolution [52]
2.03 0.00 757270818/1514548677 wake_expired_tasks [3]
2.03 0.00 757271010/1514548677 process_runnable_tasks [2]
[11] 5.8 4.05 0.00 1514548677 eb_delete [11]
-----------------------------------------------
1.85 0.00 757253131/757253131 run_poll_loop [1]
[12] 2.7 1.85 0.00 757253131 fd_process_cached_events [12]
0.00 0.00 17479/17479 dgram_fd_handler [39]
0.00 0.00 25467/25467 conn_fd_handler [49]
-----------------------------------------------
0.19 0.00 41/338 __task_queue [6]
1.41 0.00 297/338 fwrr_init_server_groups [16]
[13] 2.3 1.60 0.00 338 eb_insert_dup [13]
-----------------------------------------------
<spontaneous>
[14] 2.0 0.00 1.43 init [14]
0.00 1.41 1/1 check_config_validity [15]
0.01 0.00 1/1 pattern_finalize_config [35]
0.00 0.01 1/1 readcfgfile [36]
0.00 0.00 1/1 start_checks [55]
0.00 0.00 1/757253135 tv_update_date [5]
0.00 0.00 1/1 alloc_trash_buffers [196]
0.00 0.00 1/1 init_acl [204]
0.00 0.00 1/1 signal_init [217]
0.00 0.00 1/1 init_stream [211]
0.00 0.00 1/1 init_task [212]
0.00 0.00 1/1 init_connection [206]
0.00 0.00 1/1 init_session [210]
0.00 0.00 1/1 init_proto_http [209]
0.00 0.00 1/1 init_pendconn [207]
0.00 0.00 1/2 init_default_instance [188]
0.00 0.00 1/1 tlskeys_finalize_config [223]
0.00 0.00 1/1 init_buffer [205]
0.00 0.00 1/1 init_pollers [208]
0.00 0.00 1/1 dns_init_resolvers [200]
0.00 0.00 1/1 disable_poller [199]
0.00 0.00 1/319 pool_refill_alloc [105]
-----------------------------------------------
0.00 1.41 1/1 init [14]
[15] 2.0 0.00 1.41 1 check_config_validity [15]
0.00 1.41 5/5 fwrr_init_server_groups [16]
0.00 0.00 1/1 parse_http_req_cond [53]
0.00 0.00 312/1514546474 eb32_insert [8]
0.00 0.00 312/312 get_next_id [56]
0.00 0.00 1/757253135 tv_update_date [5]
0.00 0.00 303/312 proxy_find_by_name [110]
0.00 0.00 301/301 parse_logformat_string [120]
0.00 0.00 301/301 findserver [118]
0.00 0.00 8/8 smp_resolve_args [154]
0.00 0.00 8/8 acl_find_targets [149]
0.00 0.00 8/319 pool_refill_alloc [105]
0.00 0.00 8/8 stktable_init [157]
0.00 0.00 4/4 proxy_cfg_ensure_no_http [179]
0.00 0.00 4/4 propagate_processes [178]
0.00 0.00 4/8 ssl_sock_free_all_ctx [155]
0.00 0.00 4/8 ssl_sock_free_ca [156]
0.00 0.00 3/17 create_pool [129]
0.00 0.00 2/2 stktable_compatible_sample [191]
0.00 0.00 1/1 userlist_postinit [224]
-----------------------------------------------
$ perf report
Samples: 148K of event 'cpu-clock', Event count (approx.): 37131250000
20.76% haproxy libc-2.13.so [.] __mcount_internal
12.61% haproxy [kernel.kallsyms] [k] system_call_after_swapgs
9.80% haproxy [kernel.kallsyms] [k] pvclock_clocksource_read
7.89% haproxy libc-2.13.so [.] _mcount
5.60% haproxy perf-14525.map [.] 0x00007ffc23fc0f3b
3.31% haproxy haproxy [.] wake_expired_tasks
2.91% haproxy libc-2.13.so [.] __epoll_wait_nocancel
2.89% haproxy haproxy [.] tv_update_date
2.17% haproxy haproxy [.] _do_poll
1.97% haproxy [kernel.kallsyms] [k] __getnstimeofday
1.93% haproxy [kernel.kallsyms] [k] sys_epoll_wait
1.88% haproxy [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
1.87% haproxy haproxy [.] eb32_insert
1.86% haproxy haproxy [.] process_runnable_tasks
1.79% haproxy haproxy [.] dns_process_resolve
1.75% haproxy [kernel.kallsyms] [k] __fget_light
1.49% haproxy [kernel.kallsyms] [k] do_gettimeofday
1.36% haproxy haproxy [.] eb_delete
1.26% haproxy haproxy [.] eb32_lookup_ge
1.18% haproxy [kernel.kallsyms] [k] sys_gettimeofday
1.15% haproxy [kernel.kallsyms] [k] copy_user_generic_unrolled
1.01% haproxy haproxy [.] run_poll_loop
0.92% haproxy haproxy [.] __task_queue
0.80% haproxy [kernel.kallsyms] [k] cp_start_xmit
0.65% haproxy libc-2.13.so [.] __gettimeofday
0.64% haproxy haproxy [.] __task_wakeup
0.59% haproxy haproxy [.] mcount@plt
0.54% haproxy [kernel.kallsyms] [k] _copy_to_user
0.54% haproxy haproxy [.] applet_run_active
$ google-pprof --text
(pprof) top20
Total: 9554 samples
4439 46.5% 46.5% 4439 46.5% 0x00007fff3cbb8f3b ??:0
1801 18.9% 65.3% 1801 18.9% __epoll_wait_nocancel /build/eglibc-ZhUScE/eglibc-2.13/misc/../sysdeps/unix/syscall-template.S:82
225 2.4% 67.7% 297 3.1% eb_walk_down (inline) /root/src/haproxy/haproxy/ebtree/ebtree.h:454
80 0.8% 68.5% 111 1.2% _tv_ms_add (inline) /root/src/haproxy/haproxy/src/time.c:36
71 0.7% 69.2% 84 0.9% wake_expired_tasks /root/src/haproxy/haproxy/src/task.c:141
66 0.7% 69.9% 71 0.7% tv_update_date /root/src/haproxy/haproxy/src/time.c:165
64 0.7% 70.6% 72 0.8% eb32_lookup_ge /root/src/haproxy/haproxy/ebtree/eb32tree.c:154
55 0.6% 71.2% 4847 50.7% __gettimeofday_internal /build/eglibc-ZhUScE/eglibc-2.13/time/../sysdeps/unix/sysv/linux/x86_64/gettimeofday.S:37
46 0.5% 71.7% 46 0.5% applet_run_active /root/src/haproxy/haproxy/src/applet.c:29
44 0.5% 72.1% 46 0.5% dns_process_resolve /root/src/haproxy/haproxy/src/dns.c:1054
43 0.5% 72.6% 59 0.6% __eb32_insert (inline) /root/src/haproxy/haproxy/ebtree/eb32tree.h:293
41 0.4% 73.0% 52 0.5% dns_process_resolve /root/src/haproxy/haproxy/src/dns.c:1066
40 0.4% 73.4% 41 0.4% __tv_islt (inline) /root/src/haproxy/haproxy/include/common/time.h:228
40 0.4% 73.8% 43 0.5% applet_run_active /root/src/haproxy/haproxy/src/applet.c:25
39 0.4% 74.3% 43 0.5% fd_process_cached_events /root/src/haproxy/haproxy/src/fd.c:207
36 0.4% 74.6% 36 0.4% __close_nocancel /build/eglibc-ZhUScE/eglibc-2.13/io/../sysdeps/unix/syscall-template.S:82
36 0.4% 75.0% 36 0.4% __eb_delete (inline) /root/src/haproxy/haproxy/ebtree/ebtree.h:706
35 0.4% 75.4% 35 0.4% __gettimeofday_internal /build/eglibc-ZhUScE/eglibc-2.13/time/../sysdeps/unix/sysv/linux/x86_64/gettimeofday.S:39
35 0.4% 75.7% 41 0.4% eb32_insert /root/src/haproxy/haproxy/ebtree/eb32tree.c:26
33 0.3% 76.1% 75 0.8% __eb32_insert (inline) /root/src/haproxy/haproxy/ebtree/eb32tree.h:294
Within one minute after starting Haproxy had sent 332,974 queries to the the upstream resolver. The DNS server log confirmed a continuous rate of ~5500 queries/s for a single identical hostname.
root@gw-pl-1:~/src/haproxy/haproxy# echo "show stat resolvers mydns" | socat /var/run/haproxy/admin.sock stdio
Resolvers section mydns
nameserver dns1:
sent: 332974
valid: 0
update: 0
cname: 0
cname_error: 0
any_err: 0
nx: 0
timeout: 0
refused: 0
other: 0
invalid: 332972
too_big: 0
outdated: 1
$ echo "show stat resolvers mydns" | socat /var/run/haproxy/admin.sock stdio
Resolvers section mydns
nameserver google1:
sent: 1234
valid: 0
update: 3
cname: 0
cname_error: 2
any_err: 62
nx: 0
timeout: 0
refused: 0
other: 0
invalid: 568
too_big: 0
outdated: 326
nameserver google2:
sent: 1234
valid: 0
update: 2
cname: 0
cname_error: 2
any_err: 62
nx: 0
timeout: 0
refused: 0
other: 0
invalid: 567
too_big: 0
outdated: 359
When there are no records in a truncated reply, one would expect dns_validate_dns_response()
to return a DNS_RESP_ANCOUNT_ZERO
error upon which the snr_resolution_error_cb()
handler would fall back to A or AAAA query.
Sometimes this expected behaviour does not happen and Haproxy will continue to query the full record with QTYPE ANY
failing to ever resolve the hostname: