Skip to content

Instantly share code, notes, and snippets.

@SpComb
Last active August 23, 2018 09:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save SpComb/557605da1feeb9bc25fe3eab17e76d38 to your computer and use it in GitHub Desktop.
Save SpComb/557605da1feeb9bc25fe3eab17e76d38 to your computer and use it in GitHub Desktop.
Kube 1.11 CoreDNS + Ubuntu Bionic / Linux 4.15 => parallel DNS A+AAAA query timeouts due to *response* packets getting NF_DROP per nf_conntrack_confirm

Kube 1.11 CoreDNS + Ubuntu Bionic / Linux 4.15 => parallel DNS A+AAAA query timeouts due to response packets getting NF_DROP per nf_conntrack_confirm

Related but different issues:

AFAIK these describe the outging DNS query packets getting dropped per nf_conntrack_confirm collisions, but this is about the DNS response packets getting dropped due to nf_conntrack_confirm returning NF_DROP for the incoming response packet.

This increaes the duplicate get_unique_tuple race window from µs for the outgoing packet path to ms for the DNS server's query RTT.

Does not affect:

  • Ubuntu xenial / Linux 4.4
  • Kube 1.10 / KubeDNS? (assumed)
  • Calico instead of weave
  • succesful query rate improves significantly if tcpdump is running on the host running the coredns pod

Does affect:

  • Docker vs CRI-O
  • 1cpu vs 4cpu host
root@test:/# while ping -c1 google.com > /dev/null; do date; sleep 0.1; done
Wed Aug 22 14:11:48 UTC 2018
Wed Aug 22 14:11:53 UTC 2018
Wed Aug 22 14:11:53 UTC 2018
Wed Aug 22 14:11:53 UTC 2018
Wed Aug 22 14:11:58 UTC 2018
Wed Aug 22 14:12:04 UTC 2018
Wed Aug 22 14:12:04 UTC 2018
Wed Aug 22 14:12:09 UTC 2018
Wed Aug 22 14:12:09 UTC 2018
# ping sendmsg syscall => weave MASQUERADE
ping-24595 [003] ..s1 2741.410296: netif_receive_skb: dev=weave skbaddr=0000000004ccc006 len=84
ping-24595 [003] ..s1 2741.410297: <stack trace>
=> __netif_receive_skb_core
=> __netif_receive_skb
=> process_backlog
=> net_rx_action
=> __do_softirq
=> do_softirq_own_stack
=> do_softirq.part.17
=> __local_bh_enable_ip
=> ip_finish_output2
=> ip_finish_output
=> ip_output
=> ip_local_out
=> ip_send_skb
=> ip_push_pending_frames
=> raw_sendmsg
=> inet_sendmsg
=> sock_sendmsg
=> SYSC_sendto
=> SyS_sendto
=> do_syscall_64
=> entry_SYSCALL_64_after_hwframe
ping-24595 [003] ..s1 2741.410309: p_get_unique_tuple_0: (get_unique_tuple+0x0/0x590 [nf_nat])
ping-24595 [003] ..s1 2741.410312: <stack trace>
=> get_unique_tuple
=> nf_nat_masquerade_ipv4
=> masquerade_tg
=> ipt_do_table
=> iptable_nat_do_chain
=> nf_nat_ipv4_fn
=> nf_nat_ipv4_out
=> iptable_nat_ipv4_out
=> nf_hook_slow
=> ip_output
=> ip_forward_finish
=> ip_forward
=> ip_rcv_finish
=> ip_sabotage_in
=> nf_hook_slow
=> ip_rcv
=> __netif_receive_skb_core
=> __netif_receive_skb
=> process_backlog
=> net_rx_action
=> __do_softirq
=> do_softirq_own_stack
=> do_softirq.part.17
=> __local_bh_enable_ip
=> ip_finish_output2
=> ip_finish_output
=> ip_output
=> ip_local_out
=> ip_send_skb
=> ip_push_pending_frames
=> raw_sendmsg
=> inet_sendmsg
=> sock_sendmsg
=> SYSC_sendto
=> SyS_sendto
=> do_syscall_64
=> entry_SYSCALL_64_after_hwframe
# ping syscall send => vethwepl* DNAT
ping-5834 [001] ..s1 5406.214023: netif_receive_skb: dev=vethwepl6823e08 skbaddr=000000003a7dca17 len=82
ping-5834 [001] ..s1 5406.214029: <stack trace>
=> __netif_receive_skb_core
=> __netif_receive_skb
=> process_backlog
=> net_rx_action
=> __do_softirq
=> do_softirq_own_stack
=> do_softirq.part.17
=> __local_bh_enable_ip
=> ip_finish_output2
=> ip_finish_output
=> ip_output
=> ip_local_out
=> ip_send_skb
=> udp_send_skb
=> udp_sendmsg
=> inet_sendmsg
=> sock_sendmsg
=> ___sys_sendmsg
=> __sys_sendmmsg
=> SyS_sendmmsg
=> do_syscall_64
=> entry_SYSCALL_64_after_hwframe
ping-5834 [001] ..s1 5406.214049: p_get_unique_tuple_0: (get_unique_tuple+0x0/0x590 [nf_nat])
ping-5834 [001] ..s1 5406.214054: <stack trace>
=> get_unique_tuple
=> xt_dnat_target_v0
=> ipt_do_table
=> iptable_nat_do_chain
=> nf_nat_ipv4_fn
=> nf_nat_ipv4_in
=> iptable_nat_ipv4_in
=> nf_hook_slow
=> br_nf_pre_routing
=> nf_hook_slow
=> br_handle_frame
=> __netif_receive_skb_core
=> __netif_receive_skb
=> process_backlog
=> net_rx_action
=> __do_softirq
=> do_softirq_own_stack
=> do_softirq.part.17
=> __local_bh_enable_ip
=> ip_finish_output2
=> ip_finish_output
=> ip_output
=> ip_local_out
=> ip_send_skb
=> udp_send_skb
=> udp_sendmsg
=> inet_sendmsg
=> sock_sendmsg
=> ___sys_sendmsg
=> __sys_sendmmsg
=> SyS_sendmmsg
=> do_syscall_64
=> entry_SYSCALL_64_after_hwframe
<idle>-0 [003] d.s. 15449.497491: r___nf_conntrack_confirm_0: (ipv4_confirm+0xb4/0xe0 [nf_conntrack_ipv4] <- __nf_conntrack_confirm) ret=0x1
<idle>-0 [003] d.s. 15449.497499: <stack trace>
=> [unknown/kretprobe'd]
=> nf_hook_slow
=> ip_local_deliver
=> xfrm4_rcv_encap_finish2
=> xfrm_trans_reinject
=> tasklet_action
=> __do_softirq
=> irq_exit
=> do_IRQ
=> ret_from_intr
=> native_safe_halt
=> default_idle
=> arch_cpu_idle
=> default_idle_call
=> do_idle
=> cpu_startup_entry
=> start_secondary
=> secondary_startup_64
ping-512 [003] d.s1 15449.497728: r___nf_conntrack_confirm_0: (ipv4_confirm+0xb4/0xe0 [nf_conntrack_ipv4] <- __nf_conntrack_confirm) ret=0x1
ping-512 [003] d.s1 15449.497739: <stack trace>
=> [unknown/kretprobe'd]
=> nf_hook_slow
=> br_nf_post_routing
=> nf_hook_slow
=> br_forward_finish
=> br_nf_hook_thresh
=> br_nf_forward_finish
=> br_nf_forward_ip
=> nf_hook_slow
=> __br_forward
=> br_forward
=> br_handle_frame_finish
=> br_nf_pre_routing_finish_bridge
=> br_nf_hook_thresh
=> br_nf_pre_routing_finish
=> br_nf_pre_routing
=> nf_hook_slow
=> br_handle_frame
=> __netif_receive_skb_core
=> __netif_receive_skb
=> process_backlog
=> net_rx_action
=> __do_softirq
=> do_softirq_own_stack
=> do_softirq.part.17
=> __local_bh_enable_ip
=> ip_finish_output2
=> ip_finish_output
=> ip_output
=> ip_local_out
=> ip_send_skb
=> udp_send_skb
=> udp_sendmsg
=> inet_sendmsg
=> sock_sendmsg
=> ___sys_sendmsg
=> __sys_sendmmsg
=> SyS_sendmmsg
=> do_syscall_64
=> entry_SYSCALL_64_after_hwframe
ping-512 [003] d.s1 15449.497785: r___nf_conntrack_confirm_0: (ipv4_confirm+0xb4/0xe0 [nf_conntrack_ipv4] <- __nf_conntrack_confirm) ret=0x1
ping-512 [003] d.s1 15449.497795: <stack trace>
=> [unknown/kretprobe'd]
=> nf_hook_slow
=> xfrm4_output
=> ip_local_out
=> iptunnel_xmit
=> udp_tunnel_xmit_skb
=> vxlan_xmit_one
=> vxlan_xmit
=> dev_hard_start_xmit
=> __dev_queue_xmit
=> dev_queue_xmit
=> ovs_vport_send
=> do_output
=> do_execute_actions
=> ovs_execute_actions
=> ovs_dp_process_packet
=> ovs_vport_receive
=> netdev_frame_hook
=> __netif_receive_skb_core
=> __netif_receive_skb
=> process_backlog
=> net_rx_action
=> __do_softirq
=> do_softirq_own_stack
=> do_softirq.part.17
=> __local_bh_enable_ip
=> ip_finish_output2
=> ip_finish_output
=> ip_output
=> ip_local_out
=> ip_send_skb
=> udp_send_skb
=> udp_sendmsg
=> inet_sendmsg
=> sock_sendmsg
=> ___sys_sendmsg
=> __sys_sendmmsg
=> SyS_sendmmsg
=> do_syscall_64
=> entry_SYSCALL_64_after_hwframe
root@terom-bionic-test:/sys/kernel/debug/tracing# echo 'r __nf_conntrack_confirm ret=$retval' >> kprobe_events
root@terom-bionic-test:/sys/kernel/debug/tracing# echo 'ret == 0' > events/kprobes/r___nf_conntrack_confirm_0/filter # ret=NF_DROP
root@terom-bionic-test:/sys/kernel/debug/tracing# echo 1 > events/kprobes/enable # renable
root@terom-bionic-test:/sys/kernel/debug/tracing# cat trace_pipe
# these kprobe events are correlated with increases in the conntrack insert_failed counter as well as DNS lookup timeouts
<idle>-0 [003] d.s. 7721.958543: r___nf_conntrack_confirm_0: (ipv4_confirm+0xb4/0xe0 [nf_conntrack_ipv4] <- __nf_conntrack_confirm) ret=0x0
<idle>-0 [003] d.s. 7721.958554: <stack trace>
=> [unknown/kretprobe'd]
=> nf_hook_slow
=> ip_local_deliver
=> xfrm4_rcv_encap_finish2
=> xfrm_trans_reinject
=> tasklet_action
=> __do_softirq
=> irq_exit
=> do_IRQ
=> ret_from_intr
=> native_safe_halt
=> default_idle
=> arch_cpu_idle
=> default_idle_call
=> do_idle
=> cpu_startup_entry
=> start_secondary
=> secondary_startup_64
ping-3534 [000] d... 16959.754500: r___nf_conntrack_confirm_0: (ipv4_confirm+0xca/0x100 [nf_conntrack_ipv4] <- __nf_conntrack_confirm) ret=0x1
ping-3534 [000] d... 16959.754509: <stack trace>
=> nf_iterate
=> nf_hook_slow
=> ip_output
=> ip_local_out
=> ip_send_skb
=> udp_send_skb
=> udp_sendmsg
=> inet_sendmsg
=> sock_sendmsg
=> ___sys_sendmsg
=> __sys_sendmmsg
=> SyS_sendmmsg
=> entry_SYSCALL_64_fastpath
ping-3534 [000] d.s1 16959.754556: r___nf_conntrack_confirm_0: (ipv4_confirm+0xca/0x100 [nf_conntrack_ipv4] <- __nf_conntrack_confirm) ret=0x1
ping-3534 [000] d.s1 16959.754576: <stack trace>
=> nf_iterate
=> nf_hook_slow
=> br_nf_post_routing
=> nf_iterate
=> nf_hook_slow
=> br_forward_finish
=> br_nf_forward_finish
=> br_nf_forward_ip
=> nf_iterate
=> nf_hook_slow
=> __br_forward
=> br_forward
=> br_handle_frame_finish
=> br_nf_pre_routing_finish_bridge
=> NF_HOOK_THRESH
=> br_nf_pre_routing_finish
=> br_nf_pre_routing
=> nf_iterate
=> nf_hook_slow
=> br_handle_frame
=> __netif_receive_skb_core
=> __netif_receive_skb
=> process_backlog
=> net_rx_action
=> __do_softirq
=> do_softirq_own_stack
=> do_softirq.part.19
=> __local_bh_enable_ip
=> ip_finish_output2
=> ip_finish_output
=> ip_output
=> ip_local_out
=> ip_send_skb
=> udp_send_skb
=> udp_sendmsg
=> inet_sendmsg
=> sock_sendmsg
=> ___sys_sendmsg
=> __sys_sendmmsg
=> SyS_sendmmsg
=> entry_SYSCALL_64_fastpath
ping-3534 [000] d.s1 16959.754595: r___nf_conntrack_confirm_0: (ipv4_confirm+0xca/0x100 [nf_conntrack_ipv4] <- __nf_conntrack_confirm) ret=0x1
ping-3534 [000] d.s1 16959.754604: <stack trace>
=> nf_iterate
=> nf_hook_slow
=> ip_local_deliver
=> ip_rcv_finish
=> ip_rcv
=> __netif_receive_skb_core
=> __netif_receive_skb
=> process_backlog
=> net_rx_action
=> __do_softirq
=> do_softirq_own_stack
=> do_softirq.part.19
=> __local_bh_enable_ip
=> ip_finish_output2
=> ip_finish_output
=> ip_output
=> ip_local_out
=> ip_send_skb
=> udp_send_skb
=> udp_sendmsg
=> inet_sendmsg
=> sock_sendmsg
=> ___sys_sendmsg
=> __sys_sendmmsg
=> SyS_sendmmsg
=> entry_SYSCALL_64_fastpath
ping-3534 [000] d... 16959.755346: r___nf_conntrack_confirm_0: (ipv4_confirm+0xca/0x100 [nf_conntrack_ipv4] <- __nf_conntrack_confirm) ret=0x1
ping-3534 [000] d... 16959.755352: <stack trace>
=> nf_iterate
=> nf_hook_slow
=> ip_output
=> ip_local_out
=> ip_send_skb
=> udp_send_skb
=> udp_sendmsg
=> inet_sendmsg
=> sock_sendmsg
=> ___sys_sendmsg
=> __sys_sendmmsg
=> SyS_sendmmsg
=> entry_SYSCALL_64_fastpath
# vethwe-datapath on resolving host
13:06:36.500641 be:e0:77:9b:88:2c > 36:93:b1:a1:3e:a8, ethertype IPv4 (0x0800), length 96: 10.38.0.2.56625 > 10.40.0.1.53: 54725+ A? google.com.default.svc.cluster.local. (54)
13:06:36.500775 be:e0:77:9b:88:2c > 36:93:b1:a1:3e:a8, ethertype IPv4 (0x0800), length 96: 10.38.0.2.56625 > 10.40.0.1.53: 3222+ AAAA? google.com.default.svc.cluster.local. (54)
13:06:36.501695 36:93:b1:a1:3e:a8 > be:e0:77:9b:88:2c, ethertype IPv4 (0x0800), length 189: 10.40.0.1.53 > 10.38.0.2.56625: 3222 NXDomain 0/1/0 (147)
# vethwe-datapath on coredns host
13:06:36.521204 IP 10.38.0.2.56625 > 10.40.0.1.53: 54725+ A? google.com.default.svc.cluster.local. (54)
13:06:36.521218 IP 10.38.0.2.56625 > 10.40.0.1.53: 3222+ AAAA? google.com.default.svc.cluster.local. (54)
13:06:36.521633 IP 10.40.0.1.53 > 10.38.0.2.56625: 3222 NXDomain 0/1/0 (147)
13:06:36.521778 IP 10.40.0.1.53 > 10.38.0.2.56625: 54725 NXDomain 0/1/0 (147)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment