Skip to content

Instantly share code, notes, and snippets.

@janeczku
Last active August 31, 2015 16:54
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save janeczku/ddebc5229b05fc024038 to your computer and use it in GitHub Desktop.
Save janeczku/ddebc5229b05fc024038 to your computer and use it in GitHub Desktop.
Haproxy 1.6-dev DNS client issue report

Haproxy 1.6-dev DNS client issue report

Table of Contents

Steps to reproduce

  1. Compile Haproxy 1.6-dev3 (e.g. make TARGET=linux2628 USE_PCRE=1 USE_OPENSSL=1 USE_ZLIB=1)
  2. Run ./haproxy -f ./resolver-test.conf -db using the file here: https://gist.github.com/janeczku/e956448b6fb67c63a8cc
  3. Run top and monitor CPU usage for at least a minute

Observed results

Upon running Haproxy 1.6 with a resolvers section and including a backend server whose DNS record is larger than 512 bytes (e.g. server ibm.com ibm.com:80 check resolvers mydns) the following can be observed:

  1. The Haproxy process will start to drain CPU ressources around 30 seconds after starting up. See #1

  2. Haproxy will stop to respond to frontend requests (either immediately or after a timeperiod x)

  3. In rare cases Haproxy will additionally lock itself in a feedback loop with the upstream DNS server, sending a massive flood of identical queries for a specific hostname. This packet storm is amplificated in the receiving direction by the nature of DNS (ANY) replies being magnitude x larger than the queries. This will eventually starve bandwidth and/or system ressources. See #2

  4. In some instances Haproxy may send all its DNS queries triplicated (same package sent repeatedly 1ms apart). See #3

  5. The DNS client is interpreting a high number of DNS replies as invalid (according to "show stat resolvers"). But a Wireshark traffic analysis did not reveal invalid replies. See #4

  6. Haproxy sometimes does not fall back to a QTPYE A/QTPYE AAAA query when the reply to the first query contains no records. The result is that the hostname will never be successfully resolved. See #5

  7. When partial records are included in a truncated QTYPE ANY reply then Haproxy will use that incomplete record instead of falling back to QTYPE A or QTYPE AAAA query. This may lead to the problem where Haproxy incorrectly assumes that the IP address of a server has changed (see below).

Remarks on the current DNS client implementation

I think that not acting on the truncated (TC) flag in a DNS reply and even using partial records found in such a reply is bad practice. RFC2181 stipulates to "completely discard truncated DNS replies" for good reasons: Not discarding truncated replies or at least handling them with care is expected to cause erratic results depending solely on what the upstream resolver includes in a truncated reply's payload.

Some DNS servers will include a partial recordset in truncated replies

In this case, dns_validate_dns_response() will accept the reply as valid and Haproxy will use a potentially incomplete set of address records.

Result

Bad. While Haproxy will succeed to connect to the server, "server IP change" events may be triggered falsely in every resolution cycle (30 seconds by default):

Let's say a hostname's DNS record has two A records. A truncated reply may only contain one of them. Due to round-robin mechanism, which of the A record is included in the truncated reply may change from query to query. Thus on every other resolution cycle dns_get_ip_from_response() will incorrectly assume that the server has changed it's IP and act accordingly.

Some DNS servers truncate the payload mid-record (which is technically legal)

In this case, dns_validate_dns_response() will return a DNS_RESP_INVALID error. This will cause the error callback to stop trying to resolve the hostname and just queue it for the next resolve cycle.

Result

Bad. The hostname will NEVER be resolved.

Some DNS servers don't include any records in a truncated reply (e.g. Google)

In this case, dns_validate_dns_response() will return a DNS_RESP_ANCOUNT_ZERO error. This should cause the error handler to fall back to a QTYPE A or QTYPE AAAA query.

Result

Good. The hostname will be successfully resolved.

TL;DR
Ignoring the truncated TC flag produces erratic results including failure to resolve some hostnames.

Proposed changes to DNS client implementation

  1. Conform with RFC2181: When a DNS reply has the TC truncated flag set, discard that reply and fall back to another QTYPE or to a TCP query.
  2. Don't default to QTYPE ANY queries. Parsing the potentially very large ANY response payload introduces unnecessary overhead. Additionally since we are not interested in MX,SRV or TXT records - why bother with the entire RRset?
  3. DNS client should honor the resolve-prefer setting from the get-go: When it is set to ipv4 then the first DNS query should be a QTYPE A and the fall back a QTYPE AAAA and vice-versa.

Debugging information

Host system info

Debian 7 64-bit

Linux gw-pl-1 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt11-1~bpo70+1 (2015-06-08) x86_64 GNU/Linux

Binary

HA-Proxy version 1.6-dev3-50bdda6 2015/07/22
Copyright 2000-2015 Willy Tarreau <willy@haproxy.org>

Build options :
  TARGET  = linux2628
  CPU     = generic
  CC      = gcc
  CFLAGS  = -pg -O2 -pg -fno-strict-aliasing -Wdeclaration-after-statement
  OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.7
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with OpenSSL version : OpenSSL 1.0.1e 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.30 2012-02-04
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built without Lua support
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Gprof profile

$ gprof haproxy ./gmon.out

Flat profile (Top 35)

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 17.66     12.32    12.32 757253132     0.00     0.00  wake_expired_tasks
 14.23     22.25     9.93 757253135     0.00     0.00  tv_update_date
 10.92     29.87     7.62 757253131     0.00     0.00  _do_poll
  9.42     36.44     6.57 757253132     0.00     0.00  process_runnable_tasks
  6.62     41.06     4.62 1514546474     0.00     0.00  eb32_insert
  5.93     45.20     4.14 1514502241     0.00     0.00  eb32_lookup_ge
  5.81     49.25     4.05 1514548677     0.00     0.00  eb_delete
  5.23     52.90     3.65 757245535     0.00     0.00  dns_process_resolve
  4.59     56.10     3.20 757271119     0.00     0.00  __task_queue
  3.63     58.63     2.53 757271011     0.00     0.00  __task_wakeup
  3.02     60.74     2.11                             run_poll_loop
  2.65     62.59     1.85 757253131     0.00     0.00  fd_process_cached_events
  2.29     64.19     1.60      338     0.00     0.00  eb_insert_dup
  1.86     65.49     1.30 757253131     0.00     0.00  applet_run_active
  1.22     66.34     0.85    12737     0.00     0.00  __fd_clo
  1.18     67.16     0.82                             eb32_lookup_le
  1.06     67.90     0.74     9410     0.00     0.00  dns_send_query
  1.03     68.62     0.72                             shared_context_set_cache
  0.67     69.09     0.47                             tcp_exec_req_rules
  0.42     69.38     0.30                             _tv_isgt
  0.30     69.59     0.21    12737     0.00     0.00  fd_delete
  0.16     69.70     0.11                             strm_log
  0.03     69.72     0.02                             chk_report_conn_err
  0.01     69.73     0.01    25467     0.00     0.00  process_chk
  0.01     69.74     0.01    12734     0.00     0.00  tcp_connect_server
  0.01     69.75     0.01      610     0.00     0.00  pat_ref_newid
  0.01     69.76     0.01        1     0.01     0.01  pattern_finalize_config
  0.00     69.76     0.00    26369     0.00     0.00  srv_getinter
  0.00     69.76     0.00    25467     0.00     0.00  conn_fd_handler
  0.00     69.76     0.00    25467     0.00     0.00  fd_update_cache
  0.00     69.76     0.00    25467     0.00     0.00  set_server_check_status
  0.00     69.76     0.00    25467     0.00     0.00  tcp_connect_probe
  0.00     69.76     0.00    21955     0.00     0.00  eb32_lookup
  0.00     69.76     0.00    17479     0.00     0.00  dgram_fd_handler
  0.00     69.76     0.00    17479     0.00     0.00  dns_resolve_recv
Call graph (Top 15)

granularity: each sample hit covers 2 byte(s) for 0.01% of 69.76 seconds

index % time    self  children    called     name
                                                 <spontaneous>
[1]     94.5    2.11   63.79                 run_poll_loop [1]
                6.57   15.27 757253132/757253132     process_runnable_tasks [2]
               12.32    8.94 757253132/757253132     wake_expired_tasks [3]
                7.62    9.93 757253131/757253131     _do_poll [4]
                1.85    0.00 757253131/757253131     fd_process_cached_events [12]
                1.30    0.00 757253131/757253131     applet_run_active [18]
                0.00    0.00       1/1           __signal_process_queue [59]
                0.00    0.00       1/757253135     tv_update_date [5]
-----------------------------------------------
                6.57   15.27 757253132/757253132     run_poll_loop [1]
[2]     31.3    6.57   15.27 757253132         process_runnable_tasks [2]
                3.20    2.50 757258269/757271119     __task_queue [6]
                3.65    0.45 757245535/757245535     dns_process_resolve [10]
                2.07    0.00 757248797/1514502241     eb32_lookup_ge [9]
                2.03    0.00 757271010/1514548677     eb_delete [11]
                0.01    1.36   25467/25467       process_chk [17]
                0.00    0.00       8/8           manage_proxy [45]
-----------------------------------------------
               12.32    8.94 757253132/757253132     run_poll_loop [1]
[3]     30.5   12.32    8.94 757253132         wake_expired_tasks [3]
                2.53    2.31 757258270/757271011     __task_wakeup [7]
                2.07    0.00 757253132/1514502241     eb32_lookup_ge [9]
                2.03    0.00 757270818/1514548677     eb_delete [11]
                0.00    0.00   12548/757271119     __task_queue [6]
-----------------------------------------------
                7.62    9.93 757253131/757253131     run_poll_loop [1]
[4]     25.2    7.62    9.93 757253131         _do_poll [4]
                9.93    0.00 757253131/757253135     tv_update_date [5]
-----------------------------------------------
                0.00    0.00       1/757253135     init [14]
                0.00    0.00       1/757253135     run_poll_loop [1]
                0.00    0.00       1/757253135     check_config_validity [15]
                0.00    0.00       1/757253135     soft_stop [58]
                9.93    0.00 757253131/757253135     _do_poll [4]
[5]     14.2    9.93    0.00 757253135         tv_update_date [5]
-----------------------------------------------
                0.00    0.00       1/757271119     trigger_resolution [26]
                0.00    0.00     301/757271119     start_check_task [54]
                0.00    0.00   12548/757271119     wake_expired_tasks [3]
                3.20    2.50 757258269/757271119     process_runnable_tasks [2]
[6]      8.2    3.20    2.50 757271119         __task_queue [6]
                2.31    0.00 757271111/1514546474     eb32_insert [8]
                0.19    0.00      41/338         eb_insert_dup [13]
-----------------------------------------------
                0.00    0.00       8/757271011     soft_stop [58]
                0.00    0.00   12733/757271011     wake_srv_chk [50]
                2.53    2.31 757258270/757271011     wake_expired_tasks [3]
[7]      6.9    2.53    2.31 757271011         __task_wakeup [7]
                2.31    0.00 757271011/1514546474     eb32_insert [8]
-----------------------------------------------
                0.00    0.00       1/1514546474     bind_parse_level [61]
                0.00    0.00     301/1514546474     fwrr_init_server_groups [16]
                0.00    0.00     312/1514546474     check_config_validity [15]
                0.00    0.00    3738/1514546474     trigger_resolution [26]
                2.31    0.00 757271011/1514546474     __task_wakeup [7]
                2.31    0.00 757271111/1514546474     __task_queue [6]
[8]      6.6    4.62    0.00 1514546474         eb32_insert [8]
-----------------------------------------------
                0.00    0.00     312/1514502241     get_next_id [56]
                2.07    0.00 757248797/1514502241     process_runnable_tasks [2]
                2.07    0.00 757253132/1514502241     wake_expired_tasks [3]
[9]      5.9    4.14    0.00 1514502241         eb32_lookup_ge [9]
-----------------------------------------------
                3.65    0.45 757245535/757245535     process_runnable_tasks [2]
[10]     5.9    3.65    0.45 757245535         dns_process_resolve [10]
                0.44    0.00    5622/9410        dns_send_query [22]
                0.00    0.00    2810/3206        snr_resolution_error_cb [38]
                0.00    0.00    2810/6548        dns_reset_resolution [52]
-----------------------------------------------
                0.00    0.00     301/1514548677     deinit [44]
                0.00    0.00    6548/1514548677     dns_reset_resolution [52]
                2.03    0.00 757270818/1514548677     wake_expired_tasks [3]
                2.03    0.00 757271010/1514548677     process_runnable_tasks [2]
[11]     5.8    4.05    0.00 1514548677         eb_delete [11]
-----------------------------------------------
                1.85    0.00 757253131/757253131     run_poll_loop [1]
[12]     2.7    1.85    0.00 757253131         fd_process_cached_events [12]
                0.00    0.00   17479/17479       dgram_fd_handler [39]
                0.00    0.00   25467/25467       conn_fd_handler [49]
-----------------------------------------------
                0.19    0.00      41/338         __task_queue [6]
                1.41    0.00     297/338         fwrr_init_server_groups [16]
[13]     2.3    1.60    0.00     338         eb_insert_dup [13]
-----------------------------------------------
                                                 <spontaneous>
[14]     2.0    0.00    1.43                 init [14]
                0.00    1.41       1/1           check_config_validity [15]
                0.01    0.00       1/1           pattern_finalize_config [35]
                0.00    0.01       1/1           readcfgfile [36]
                0.00    0.00       1/1           start_checks [55]
                0.00    0.00       1/757253135     tv_update_date [5]
                0.00    0.00       1/1           alloc_trash_buffers [196]
                0.00    0.00       1/1           init_acl [204]
                0.00    0.00       1/1           signal_init [217]
                0.00    0.00       1/1           init_stream [211]
                0.00    0.00       1/1           init_task [212]
                0.00    0.00       1/1           init_connection [206]
                0.00    0.00       1/1           init_session [210]
                0.00    0.00       1/1           init_proto_http [209]
                0.00    0.00       1/1           init_pendconn [207]
                0.00    0.00       1/2           init_default_instance [188]
                0.00    0.00       1/1           tlskeys_finalize_config [223]
                0.00    0.00       1/1           init_buffer [205]
                0.00    0.00       1/1           init_pollers [208]
                0.00    0.00       1/1           dns_init_resolvers [200]
                0.00    0.00       1/1           disable_poller [199]
                0.00    0.00       1/319         pool_refill_alloc [105]
-----------------------------------------------
                0.00    1.41       1/1           init [14]
[15]     2.0    0.00    1.41       1         check_config_validity [15]
                0.00    1.41       5/5           fwrr_init_server_groups [16]
                0.00    0.00       1/1           parse_http_req_cond [53]
                0.00    0.00     312/1514546474     eb32_insert [8]
                0.00    0.00     312/312         get_next_id [56]
                0.00    0.00       1/757253135     tv_update_date [5]
                0.00    0.00     303/312         proxy_find_by_name [110]
                0.00    0.00     301/301         parse_logformat_string [120]
                0.00    0.00     301/301         findserver [118]
                0.00    0.00       8/8           smp_resolve_args [154]
                0.00    0.00       8/8           acl_find_targets [149]
                0.00    0.00       8/319         pool_refill_alloc [105]
                0.00    0.00       8/8           stktable_init [157]
                0.00    0.00       4/4           proxy_cfg_ensure_no_http [179]
                0.00    0.00       4/4           propagate_processes [178]
                0.00    0.00       4/8           ssl_sock_free_all_ctx [155]
                0.00    0.00       4/8           ssl_sock_free_ca [156]
                0.00    0.00       3/17          create_pool [129]
                0.00    0.00       2/2           stktable_compatible_sample [191]
                0.00    0.00       1/1           userlist_postinit [224]
-----------------------------------------------

Perf report

$ perf report

Samples: 148K of event 'cpu-clock', Event count (approx.): 37131250000
  20.76%  haproxy  libc-2.13.so          [.] __mcount_internal
  12.61%  haproxy  [kernel.kallsyms]     [k] system_call_after_swapgs
   9.80%  haproxy  [kernel.kallsyms]     [k] pvclock_clocksource_read
   7.89%  haproxy  libc-2.13.so          [.] _mcount
   5.60%  haproxy  perf-14525.map        [.] 0x00007ffc23fc0f3b
   3.31%  haproxy  haproxy               [.] wake_expired_tasks
   2.91%  haproxy  libc-2.13.so          [.] __epoll_wait_nocancel
   2.89%  haproxy  haproxy               [.] tv_update_date
   2.17%  haproxy  haproxy               [.] _do_poll
   1.97%  haproxy  [kernel.kallsyms]     [k] __getnstimeofday
   1.93%  haproxy  [kernel.kallsyms]     [k] sys_epoll_wait
   1.88%  haproxy  [kernel.kallsyms]     [k] _raw_spin_unlock_irqrestore
   1.87%  haproxy  haproxy               [.] eb32_insert
   1.86%  haproxy  haproxy               [.] process_runnable_tasks
   1.79%  haproxy  haproxy               [.] dns_process_resolve
   1.75%  haproxy  [kernel.kallsyms]     [k] __fget_light
   1.49%  haproxy  [kernel.kallsyms]     [k] do_gettimeofday
   1.36%  haproxy  haproxy               [.] eb_delete
   1.26%  haproxy  haproxy               [.] eb32_lookup_ge
   1.18%  haproxy  [kernel.kallsyms]     [k] sys_gettimeofday
   1.15%  haproxy  [kernel.kallsyms]     [k] copy_user_generic_unrolled
   1.01%  haproxy  haproxy               [.] run_poll_loop
   0.92%  haproxy  haproxy               [.] __task_queue
   0.80%  haproxy  [kernel.kallsyms]     [k] cp_start_xmit
   0.65%  haproxy  libc-2.13.so          [.] __gettimeofday
   0.64%  haproxy  haproxy               [.] __task_wakeup
   0.59%  haproxy  haproxy               [.] mcount@plt
   0.54%  haproxy  [kernel.kallsyms]     [k] _copy_to_user
   0.54%  haproxy  haproxy               [.] applet_run_active

Gperftools report

$ google-pprof --text

(pprof) top20
Total: 9554 samples
    4439  46.5%  46.5%     4439  46.5% 0x00007fff3cbb8f3b ??:0
    1801  18.9%  65.3%     1801  18.9% __epoll_wait_nocancel /build/eglibc-ZhUScE/eglibc-2.13/misc/../sysdeps/unix/syscall-template.S:82
     225   2.4%  67.7%      297   3.1% eb_walk_down (inline) /root/src/haproxy/haproxy/ebtree/ebtree.h:454
      80   0.8%  68.5%      111   1.2% _tv_ms_add (inline) /root/src/haproxy/haproxy/src/time.c:36
      71   0.7%  69.2%       84   0.9% wake_expired_tasks /root/src/haproxy/haproxy/src/task.c:141
      66   0.7%  69.9%       71   0.7% tv_update_date /root/src/haproxy/haproxy/src/time.c:165
      64   0.7%  70.6%       72   0.8% eb32_lookup_ge /root/src/haproxy/haproxy/ebtree/eb32tree.c:154
      55   0.6%  71.2%     4847  50.7% __gettimeofday_internal /build/eglibc-ZhUScE/eglibc-2.13/time/../sysdeps/unix/sysv/linux/x86_64/gettimeofday.S:37
      46   0.5%  71.7%       46   0.5% applet_run_active /root/src/haproxy/haproxy/src/applet.c:29
      44   0.5%  72.1%       46   0.5% dns_process_resolve /root/src/haproxy/haproxy/src/dns.c:1054
      43   0.5%  72.6%       59   0.6% __eb32_insert (inline) /root/src/haproxy/haproxy/ebtree/eb32tree.h:293
      41   0.4%  73.0%       52   0.5% dns_process_resolve /root/src/haproxy/haproxy/src/dns.c:1066
      40   0.4%  73.4%       41   0.4% __tv_islt (inline) /root/src/haproxy/haproxy/include/common/time.h:228
      40   0.4%  73.8%       43   0.5% applet_run_active /root/src/haproxy/haproxy/src/applet.c:25
      39   0.4%  74.3%       43   0.5% fd_process_cached_events /root/src/haproxy/haproxy/src/fd.c:207
      36   0.4%  74.6%       36   0.4% __close_nocancel /build/eglibc-ZhUScE/eglibc-2.13/io/../sysdeps/unix/syscall-template.S:82
      36   0.4%  75.0%       36   0.4% __eb_delete (inline) /root/src/haproxy/haproxy/ebtree/ebtree.h:706
      35   0.4%  75.4%       35   0.4% __gettimeofday_internal /build/eglibc-ZhUScE/eglibc-2.13/time/../sysdeps/unix/sysv/linux/x86_64/gettimeofday.S:39
      35   0.4%  75.7%       41   0.4% eb32_insert /root/src/haproxy/haproxy/ebtree/eb32tree.c:26
      33   0.3%  76.1%       75   0.8% __eb32_insert (inline) /root/src/haproxy/haproxy/ebtree/eb32tree.h:294

Attachments

Example truncated DNS reply

Example truncated reply

CPU exhaustion

Haproxy CPU usage

DNS traffic flood

Within one minute after starting Haproxy had sent 332,974 queries to the the upstream resolver. The DNS server log confirmed a continuous rate of ~5500 queries/s for a single identical hostname.

root@gw-pl-1:~/src/haproxy/haproxy# echo "show stat resolvers mydns" | socat /var/run/haproxy/admin.sock stdio
Resolvers section mydns
 nameserver dns1:
  sent: 332974
  valid: 0
  update: 0
  cname: 0
  cname_error: 0
  any_err: 0
  nx: 0
  timeout: 0
  refused: 0
  other: 0
  invalid: 332972
  too_big: 0
  outdated: 1

DNS query packet triplication

Packet triplication

Many DNS replies are interpreted as invalid

$ echo "show stat resolvers mydns" | socat /var/run/haproxy/admin.sock stdio

Resolvers section mydns
 nameserver google1:
  sent: 1234
  valid: 0
  update: 3
  cname: 0
  cname_error: 2
  any_err: 62
  nx: 0
  timeout: 0
  refused: 0
  other: 0
  invalid: 568
  too_big: 0
  outdated: 326
 nameserver google2:
  sent: 1234
  valid: 0
  update: 2
  cname: 0
  cname_error: 2
  any_err: 62
  nx: 0
  timeout: 0
  refused: 0
  other: 0
  invalid: 567
  too_big: 0
  outdated: 359

Fails to resolve hostnames

When there are no records in a truncated reply, one would expect dns_validate_dns_response() to return a DNS_RESP_ANCOUNT_ZERO error upon which the snr_resolution_error_cb() handler would fall back to A or AAAA query. Sometimes this expected behaviour does not happen and Haproxy will continue to query the full record with QTYPE ANY failing to ever resolve the hostname:

Fallback not working

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment