Skip to content

Instantly share code, notes, and snippets.

@rjarry
Last active April 4, 2024 18:36
Show Gist options
  • Save rjarry/efe91f14a9bda4eb287592f696dbd123 to your computer and use it in GitHub Desktop.
Save rjarry/efe91f14a9bda4eb287592f696dbd123 to your computer and use it in GitHub Desktop.

Topology

+--------+     VLAN 280      +------+
|       0+-------------------+      |
| trex   |     VLAN 290      |      |
|       1+-------------------+      |
+--------+                   |      |
                             |      |
+--------+     VLAN 280      |      |
|       0+-------------------+ TOR  |
| dut    |     VLAN 290      |switch|
|       1+-------------------+      |
+--------+                   |      |
                             |      |
+--------+     VLAN 280      |      |
|       0+-------------------+      |
| dut    |     VLAN 290      |      |
|       1+-------------------+      |
+--------+                   +------+

There will be two separate tests:

  • Kernel conntracks
  • Userspace conntrack (DPDK)

Only one of the machines dpdk or linux will be handling traffic at the same time.

Playbook

Click to expand playbook
- hosts: all
  tasks:
    - dnf:
        name: 'http://download-node-02.eng.bos.redhat.com/rcm-guest/puddles/OpenStack/rhos-release/rhos-release-latest.noarch.rpm'
        state: present
    - command: rhos-release 16.2 -r 8.4
    - command: dnf config-manager --enable rhosp-rhel-8.4-crb
    - dnf:
        name:
          - perf
          - kernel-debug

- hosts:
    - trex
    - dpdk
  tasks:
    - copy:
        dest: /etc/modules-load.d/vfio-pci.conf
        content: vfio-pci
    - lineinfile:
        path: /etc/default/grub
        regexp: '^GRUB_CMDLINE_LINUX="(.+) quiet"'
        line: 'GRUB_CMDLINE_LINUX="\1 quiet intel_iommu=on iommu=pt default_hugepagesz=1GB hugepagesz=1G hugepages=64"'
        state: present
    - shell: 'grub2-mkconfig > /boot/grub2/grub.cfg'
    - reboot:
    - shell: |
        driverctl set-override 0000:3b:00.0 vfio-pci
        driverctl set-override 0000:3b:00.1 vfio-pci

- hosts: trex
  tasks:
    - git:
        repo: 'https://github.com/rh-nfv-int/trex-core'
        dest: /root/trex-core
        version: master
    - command:
        cmd: './b {{item}}'
        chdir: /root/trex-core/linux_dpdk
      loop:
        - configure
        - build
    - copy:
        dest: /etc/trex_cfg.yaml
        content: |
          - version: 2
            interfaces: ['3b:00.0', '3b:00.1']
            port_info:
              - dest_mac: 'f8:f2:1e:da:ca:a1'
                src_mac: 'f8:f2:1e:da:ca:a0'
              - dest_mac: 'f8:f2:1e:da:ca:a0'
                src_mac: 'f8:f2:1e:da:ca:a1'
            c: 30
            memory:
              mbuf_64: 30720
              mbuf_128: 500000
              mbuf_256: 30717
              mbuf_512: 30720
              mbuf_1024: 30720
              mbuf_2048: 500000
              mbuf_4096: 4096
              mbuf_9k: 4096
            platform:
              master_thread_id: 0
              latency_thread_id: 1
              dual_if:
                - socket: 0
                  threads: [2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62]
    - shell: |
        cd /root/trex-core/scripts
        nohup ./t-rex-64 --astf -i >/tmp/trex.log 2>&1 &

- hosts:
    - dpdk
    - linux
  tasks:
    - dnf:
        name:
          - openvswitch2.15
          - openvswitch2.15-test
          - openvswitch2.15-debuginfo
          - conntrack-tools
    - service:
        name: openvswitch
        state: restarted

- hosts: dpdk
  tasks:
    - shell: |
        ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev
        ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk options:dpdk-devargs=0000:3b:00.0
        ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk options:dpdk-devargs=0000:3b:00.1
        ovs-vsctl set open_vswitch . other_config:pmd-cpu-mask="0x2"
        ovs-vsctl set Interface dpdk0 options:n_rxq_desc=4096 options:n_txq_desc=4096
        ovs-vsctl set Interface dpdk1 options:n_rxq_desc=4096 options:n_txq_desc=4096
        ovs-appctl dpctl/ct-set-maxconns 3000000

- hosts: linux
  tasks:
    - shell: |
        ovs-vsctl add-br br0
        ovs-vsctl add-port br0 ens1f0
        ovs-vsctl add-port br0 ens1f1
        ethtool -G ens1f0 tx 4096 rx 4096
        ethtool -G ens1f1 tx 4096 rx 4096
        echo 3000000 > /proc/sys/net/netfilter/nf_conntrack_max
        systemctl stop irqbalance.service

- hosts:
    - dpdk
    - linux
  tasks:
    - shell: |
        ovs-ofctl add-flow br0 "table=0,priority=0,actions=NORMAL"
        ovs-ofctl add-flow br0 "table=0,priority=10,tcp,ct_state=-trk,actions=ct(table=0)"
        ovs-ofctl add-flow br0 "table=0,priority=5,tcp,ct_state=+trk,actions=ct(commit),NORMAL"
        ovs-ofctl add-flow br0 "table=0,priority=5,udp,ct_state=+trk,actions=ct(commit),NORMAL"
        ovs-ofctl add-flow br0 "table=0,priority=10,udp,ct_state=-trk,actions=ct(table=0)"

Platform details

Click to expand details
# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              64
On-line CPU(s) list: 0-63
Thread(s) per core:  2
Core(s) per socket:  16
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
BIOS Vendor ID:      Intel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
BIOS Model name:     Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
Stepping:            7
CPU MHz:             2837.800
CPU max MHz:         3200.0000
CPU min MHz:         800.0000
BogoMIPS:            4200.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            22528K
NUMA node0 CPU(s):   0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62
NUMA node1 CPU(s):   1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke avx512_vnni md_clear flush_l1d arch_capabilities

# cat /proc/meminfo
MemTotal:       196387748 kB
MemFree:        117410336 kB
MemAvailable:   123909752 kB
Buffers:            5280 kB
Cached:          7396728 kB
SwapCached:            0 kB
Active:          1906844 kB
Inactive:        5689436 kB
Active(anon):       6092 kB
Inactive(anon):   258420 kB
Active(file):    1900752 kB
Inactive(file):  5431016 kB
Unevictable:      814188 kB
Mlocked:          814188 kB
SwapTotal:       4194300 kB
SwapFree:        4194300 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:       1007872 kB
Mapped:           165004 kB
Shmem:             21816 kB
KReclaimable:     391780 kB
Slab:             740412 kB
SReclaimable:     391780 kB
SUnreclaim:       348632 kB
KernelStack:       13552 kB
PageTables:        11108 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    68833740 kB
Committed_AS:     750740 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
Percpu:            79360 kB
HardwareCorrupted:     0 kB
AnonHugePages:    768000 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
HugePages_Total:      64
HugePages_Free:       62
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:    1048576 kB
Hugetlb:        67108864 kB
DirectMap4k:      400896 kB
DirectMap2M:    11798528 kB
DirectMap1G:    189792256 kB

# lspci | grep X710
3b:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
3b:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)

# head -n2 /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="8.4 (Ootpa)"

# uname -r
4.18.0-305.el8.x86_64

# ovs-vswitchd --version
ovs-vswitchd (Open vSwitch) 2.15.4
DPDK 20.11.1

Command description

A new "non-drop rate" script was created specificly tailored for conntrack stress tests.

https://github.com/rh-nfv-int/trex-core/blob/master/scripts/conntrack_ndr.py

Click to expand command usage
usage: conntrack_ndr.py [-h] [-i MAX_ITERATIONS] [-I ITERATION_SAMPLE_TIME]
                        [-o FILE] [-v] [-S SERVER] [-p SYNC_PORT]
                        [-P ASYNC_PORT] [-t TIMEOUT] [-u UDP_PERCENT]
                        [-n NUM_MESSAGES] [-s MESSAGE_SIZE] [-w SERVER_WAIT]
                        [-m MIN_CPS] [-M MAX_CPS] [-f MAX_FLOWS]
                        [-a ADAPTIVE_WINDOW] [-r RAMP_UP]

Connect to a TRex server and send TCP/UDP traffic at varying rates of
connections per second following a binary search algorithm to find a "non-drop
rate". The traffic is not stopped in between rate changes.

options:
  -h, --help            show this help message and exit
  -i MAX_ITERATIONS, --max-iterations MAX_ITERATIONS
                        Max number of iterations. (default: None)
  -I ITERATION_SAMPLE_TIME, --iteration-sample-time ITERATION_SAMPLE_TIME
                        Iteration sample duration in seconds. (default: 1)
  -o FILE, --output FILE
                        Write last no-error stats to file in the JSON format.
                        (default: None)
  -v, --verbose         Be more verbose. (default: False)

trex server options:
  -S SERVER, --server SERVER
                        TRex server name or IP. (default: localhost)
  -p SYNC_PORT, --sync-port SYNC_PORT
                        RPC port number. (default: 4501)
  -P ASYNC_PORT, --async-port ASYNC_PORT
                        Subscriber port number. (default: 4500)
  -t TIMEOUT, --timeout TIMEOUT
                        TRex connection timeout. (default: 5)

traffic profile options:
  -u UDP_PERCENT, --udp-percent UDP_PERCENT
                        Percentage of UDP connections. (default: 0.0)
  -n NUM_MESSAGES, --num-messages NUM_MESSAGES
                        Number of data messages (request+response) exchanged
                        per connection. (default: 1)
  -s MESSAGE_SIZE, --message-size MESSAGE_SIZE
                        Size of data messages in bytes. (default: 20)
  -w SERVER_WAIT, --server-wait SERVER_WAIT
                        Time in ms that the server waits before sending a
                        response to a request. (default: 0)

rate options:
  -m MIN_CPS, --min-cps MIN_CPS
                        Min number of connections created&destroyed per
                        second. (default: 10000)
  -M MAX_CPS, --max-cps MAX_CPS
                        Min number of connections created&destroyed per
                        second. (default: 500000)
  -f MAX_FLOWS, --max-flows MAX_FLOWS
                        Max number of concurrent active flows. (default: None)
  -a ADAPTIVE_WINDOW, --adaptive-window ADAPTIVE_WINDOW
                        Allowed grow percentage on the binary search window
                        after 3 consecutive "good" or "bad" iterations. By
                        default, the binary search window can only be
                        shrinking at each iteration. If this parameter is non-
                        zero, the upper or lower bounds can be extended by
                        ADAPTIVE_WINDOW percent. This allows automatic optimal
                        range finding but causes the result to never converge.
                        Use this only for development. (default: 0)
  -r RAMP_UP, --ramp-up RAMP_UP
                        Time in seconds before TRex reaches a stable number of
                        connections per second. This setting also affects the
                        period of rate change (iterations). (default: 3)

This actual command was used for each run. The --min-cps and --max-cps were adjusted for each config to get the algorithm to converge faster.

./conntrack_ndr.py --iteration-sample-time 10 --verbose \
        --udp-percent 1 --max-flows 400k --min-cps 50k --max-cps 100k

Profiling: OVS 2.15

Linux: 1 core

NOTE: When testing with kernel conntrack, irqbalance must be stopped and IRQ affinity must be manually configured to ensure reliable and reproducible results.

Flows: active 44.9K (89.5K/s) TX: 387Mb/s (620Kp/s) RX: 387Mb/s (620Kp/s) Size: ~4.5B

Config:

ethtool -L ens1f0 combined 1
ethtool -L ens1f1 combined 1

systemctl stop irqbalance.service
# sed -nre 's/^ ([0-9]+:) .* (i40e-ens1f.*TxRx.*)$/\1 \2/p' /proc/interrupts
# 129: i40e-ens1f0-TxRx-0
# 214: i40e-ens1f1-TxRx-0
echo 2 > /proc/irq/129/smp_affinity_list
echo 2 > /proc/irq/214/smp_affinity_list

CPU usage:

%Cpu2  :  0.0 us,  0.3 sy,  0.0 ni,  4.4 id,  0.0 wa,  0.0 hi, 95.2 si,  0.0 st

perf top:

  13.08%  [kernel]                              [k] __nf_conntrack_find_get
  10.59%  [kernel]                              [k] masked_flow_lookup
   9.84%  [kernel]                              [k] _raw_spin_lock
   5.23%  [kernel]                              [k] __nf_conntrack_confirm
   3.39%  [kernel]                              [k] dev_gro_receive
   3.26%  [kernel]                              [k] i40e_clean_rx_irq
   2.37%  [kernel]                              [k] i40e_get_current_atr_cnt
   2.34%  [kernel]                              [k] i40e_xmit_frame_ring
   2.29%  [kernel]                              [k] i40e_get_global_fd_count
   1.80%  [kernel]                              [k] __ovs_ct_lookup
   1.64%  [kernel]                              [k] ovs_flow_tbl_lookup_stats
   1.61%  [kernel]                              [k] nf_conntrack_tcp_packet
   1.42%  [kernel]                              [k] hash_conntrack_raw
   1.20%  [kernel]                              [k] __dev_queue_xmit
   1.14%  [kernel]                              [k] nf_ct_seq_offset
   1.12%  [kernel]                              [k] nf_ct_deliver_cached_events
   1.12%  [kernel]                              [k] ovs_ct_execute
   1.11%  [kernel]                              [k] tcp_in_window
   1.11%  [kernel]                              [k] nf_ct_get_tuple
   1.08%  [kernel]                              [k] nf_conntrack_in

Linux: 2 cores

Flows: active 57.6K (114K/s) TX: 497Mb/s (797Kp/s) RX: 497Mb/s (797Kp/s) Size: ~4.5B

Config:

ethtool -L ens1f0 combined 1
ethtool -L ens1f1 combined 1

systemctl stop irqbalance.service
# sed -nre 's/^ ([0-9]+:) .* (i40e-ens1f.*TxRx.*)$/\1 \2/p' /proc/interrupts
# 129: i40e-ens1f0-TxRx-0
# 214: i40e-ens1f1-TxRx-0
echo 2 > /proc/irq/129/smp_affinity_list
echo 6 > /proc/irq/214/smp_affinity_list

CPU usage:

%Cpu2  :  0.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,100.0 si,  0.0 st
%Cpu6  :  0.0 us,  0.0 sy,  0.0 ni, 44.4 id,  0.0 wa,  0.0 hi, 55.6 si,  0.0 st

perf top:

  12.89%  [kernel]              [k] __nf_conntrack_find_get
  11.05%  [kernel]              [k] _raw_spin_lock
   8.64%  [kernel]              [k] masked_flow_lookup
   8.53%  [kernel]              [k] __nf_conntrack_confirm
   3.09%  [kernel]              [k] dev_gro_receive
   2.61%  [kernel]              [k] i40e_clean_rx_irq
   2.59%  [kernel]              [k] nf_conntrack_tcp_packet
   2.37%  [kernel]              [k] i40e_xmit_frame_ring
   2.15%  [kernel]              [k] skb_release_head_state
   1.43%  [kernel]              [k] ovs_flow_tbl_lookup_stats
   1.34%  [kernel]              [k] hash_conntrack_raw
   1.34%  [kernel]              [k] nf_ct_seq_offset
   1.30%  [kernel]              [k] __ovs_ct_lookup
   1.29%  [kernel]              [k] __slab_free
   1.27%  [kernel]              [k] ___slab_alloc
   1.13%  [kernel]              [k] nf_conntrack_in
   1.08%  [kernel]              [k] kmem_cache_free
   1.01%  [kernel]              [k] nf_ct_deliver_cached_events

Linux: 4 cores

Flows: active 99.4K (195K/s) TX: 843Mb/s (1.4Mp/s) RX: 843Mb/s (1.4Mp/s) Size: ~4.5B

Config:

ethtool -L ens1f0 combined 2
ethtool -L ens1f1 combined 2

systemctl stop irqbalance.service
# sed -nre 's/^ ([0-9]+:) .* (i40e-ens1f.*TxRx.*)$/\1 \2/p' /proc/interrupts
# 129: i40e-ens1f0-TxRx-0
# 130: i40e-ens1f0-TxRx-1
# 214: i40e-ens1f1-TxRx-0
# 215: i40e-ens1f1-TxRx-1
echo 2 > /proc/irq/129/smp_affinity_list
echo 4 > /proc/irq/130/smp_affinity_list
echo 6 > /proc/irq/214/smp_affinity_list
echo 8 > /proc/irq/215/smp_affinity_list

CPU usage:

%Cpu2  :  0.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.3 hi, 99.7 si,  0.0 st
%Cpu4  :  0.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,100.0 si,  0.0 st
%Cpu6  :  0.0 us,  0.0 sy,  0.0 ni, 48.2 id,  0.0 wa,  0.3 hi, 51.5 si,  0.0 st
%Cpu8  :  0.0 us,  0.0 sy,  0.0 ni, 48.2 id,  0.0 wa,  0.3 hi, 51.5 si,  0.0 st

perf top:

  16.47%  [kernel]                                [k] __nf_conntrack_find_get
  12.48%  [kernel]                                [k] __nf_conntrack_confirm
   9.71%  [kernel]                                [k] _raw_spin_lock
   8.00%  [kernel]                                [k] masked_flow_lookup
   2.42%  [kernel]                                [k] i40e_clean_rx_irq
   2.37%  [kernel]                                [k] __ovs_ct_lookup
   2.28%  [kernel]                                [k] nf_conntrack_tcp_packet
   2.09%  [kernel]                                [k] i40e_xmit_frame_ring
   2.02%  [kernel]                                [k] dev_gro_receive
   1.81%  [kernel]                                [k] skb_release_head_state
   1.41%  [kernel]                                [k] nf_conntrack_in
   1.27%  [kernel]                                [k] ovs_flow_tbl_lookup_stats
   1.23%  [kernel]                                [k] __slab_free
   1.21%  [kernel]                                [k] ___slab_alloc
   1.16%  [kernel]                                [k] kmem_cache_free
   1.06%  [kernel]                                [k] hash_conntrack_raw
   0.94%  [kernel]                                [k] napi_gro_receive
   0.94%  [kernel]                                [k] __dev_queue_xmit
   0.90%  [kernel]                                [k] ovs_ct_execute
   0.88%  [kernel]                                [k] nf_ct_seq_offset
   0.85%  [kernel]                                [k] page_frag_free
   0.83%  [kernel]                                [k] nf_ct_deliver_cached_events

Linux: 8 cores

Flows: active 128K (255K/s) TX: 1.1Gb/s (1.8Mp/s) RX: 1.1Gb/s (1.8Mp/s) Size: ~4.5B

Config:

ethtool -L ens1f1 combined 4
ethtool -L ens1f0 combined 4

systemctl stop irqbalance.service
# sed -nre 's/^ ([0-9]+:) .* (i40e-ens1f.*TxRx.*)$/\1 \2/p' /proc/interrupts
# 129: i40e-ens1f0-TxRx-0
# 130: i40e-ens1f0-TxRx-1
# 131: i40e-ens1f0-TxRx-2
# 132: i40e-ens1f0-TxRx-3
# 214: i40e-ens1f1-TxRx-0
# 215: i40e-ens1f1-TxRx-1
# 216: i40e-ens1f1-TxRx-2
# 217: i40e-ens1f1-TxRx-3
echo 2 > /proc/irq/129/smp_affinity_list
echo 4 > /proc/irq/130/smp_affinity_list
echo 6 > /proc/irq/131/smp_affinity_list
echo 8 > /proc/irq/132/smp_affinity_list
echo 10 > /proc/irq/214/smp_affinity_list
echo 12 > /proc/irq/215/smp_affinity_list
echo 14 > /proc/irq/216/smp_affinity_list
echo 16 > /proc/irq/217/smp_affinity_list

CPU usage:

%Cpu2  :  0.0 us,  0.0 sy,  0.0 ni, 46.4 id,  0.0 wa,  0.0 hi, 53.6 si,  0.0 st
%Cpu4  :  0.0 us,  0.0 sy,  0.0 ni, 28.1 id,  0.0 wa,  0.3 hi, 71.6 si,  0.0 st
%Cpu6  :  0.0 us,  0.7 sy,  0.0 ni,  0.7 id,  0.0 wa,  0.3 hi, 98.3 si,  0.0 st
%Cpu8  :  0.0 us,  0.0 sy,  0.0 ni, 27.9 id,  0.0 wa,  0.3 hi, 71.8 si,  0.0 st
%Cpu10 :  0.0 us,  0.0 sy,  0.0 ni, 89.0 id,  0.0 wa,  0.0 hi, 11.0 si,  0.0 st
%Cpu12 :  0.0 us,  0.0 sy,  0.0 ni, 64.8 id,  0.0 wa,  0.3 hi, 34.9 si,  0.0 st
%Cpu14 :  0.0 us,  0.0 sy,  0.0 ni, 40.9 id,  0.0 wa,  0.3 hi, 58.8 si,  0.0 st
%Cpu16 :  0.0 us,  0.0 sy,  0.0 ni, 64.9 id,  0.0 wa,  0.3 hi, 34.8 si,  0.0 st

perf top:

  15.31%  [kernel]              [k] __nf_conntrack_find_get
  10.85%  [kernel]              [k] __nf_conntrack_confirm
   9.40%  [kernel]              [k] _raw_spin_lock
   8.09%  [kernel]              [k] masked_flow_lookup
   2.61%  [kernel]              [k] i40e_xmit_frame_ring
   2.14%  [kernel]              [k] i40e_clean_rx_irq
   2.09%  [kernel]              [k] dev_gro_receive
   2.06%  [kernel]              [k] __ovs_ct_lookup
   2.04%  [kernel]              [k] nf_conntrack_tcp_packet
   2.04%  [kernel]              [k] __slab_free
   2.03%  [kernel]              [k] skb_release_head_state
   1.38%  [kernel]              [k] nf_conntrack_in
   1.23%  [kernel]              [k] ovs_flow_tbl_lookup_stats
   1.11%  [kernel]              [k] __dev_queue_xmit

DPDK: 1 core, no conntrack flows

Flows: active 498K (1.0M/s) TX: 4.3Gb/s (7.0Mp/s) RX: 4.3Gb/s (7.0Mp/s) Size: ~4.5B

CPU usage:

pmd thread numa_id 0 core_id 2:
  isolated : false
  port: dpdk0             queue-id:  0 (enabled)   pmd usage: 51 %
  port: dpdk1             queue-id:  0 (enabled)   pmd usage: 40 %
  overhead:  5 %

perf top:

  32.88%  ovs-vswitchd          [.] dp_netdev_input__
  15.87%  ovs-vswitchd          [.] miniflow_extract
   7.95%  ovs-vswitchd          [.] dpcls_subtable_lookup_mf_u0w5_u1w1
   7.34%  ovs-vswitchd          [.] i40e_xmit_fixed_burst_vec_avx2
   5.17%  ovs-vswitchd          [.] i40e_recv_pkts_vec_avx2
   4.12%  ovs-vswitchd          [.] fast_path_processing
   2.42%  ovs-vswitchd          [.] non_atomic_ullong_add
   1.94%  ovs-vswitchd          [.] cmap_find_by_index
   1.86%  ovs-vswitchd          [.] netdev_send
   1.68%  ovs-vswitchd          [.] dp_execute_output_action
   1.53%  ovs-vswitchd          [.] cmap_find_batch
   1.41%  ovs-vswitchd          [.] cmap_find_index
   1.24%  [vdso]                [.] __vdso_clock_gettime
   1.21%  ovs-vswitchd          [.] dp_netdev_pmd_flush_output_on_port
   1.09%  ovs-vswitchd          [.] pmd_perf_end_iteration
   1.08%  ovs-vswitchd          [.] dp_netdev_process_rxq_port

DPDK: 1 core

Flows: active 23.4K (46.6K/s) TX: 202Mb/s (324Kp/s) RX: 202Mb/s (324Kp/s) Size: ~4.5B

Config:

ovs-vsctl set open_vswitch . other_config:pmd-cpu-mask="0x4"
ovs-vsctl set Interface dpdk0 options:n_rxq=1
ovs-vsctl set Interface dpdk1 options:n_rxq=1

CPU usage:

pmd thread numa_id 1 core_id 2:
  isolated : false
  port: dpdk0             queue-id:  0 (enabled)   pmd usage: 30 %
  port: dpdk1             queue-id:  0 (enabled)   pmd usage: 19 %

perf top:

  14.77%  ovs-vswitchd        [.] dp_netdev_process_rxq_port
  11.45%  [vdso]              [.] __vdso_clock_gettime
   9.07%  ovs-vswitchd        [.] i40e_recv_pkts_vec_avx2
   7.84%  ovs-vswitchd        [.] netdev_dpdk_rxq_recv
   4.35%  ovs-vswitchd        [.] pmd_thread_main
   4.30%  libpthread-2.28.so  [.] __pthread_mutex_unlock_usercnt
   4.12%  ovs-vswitchd        [.] pmd_perf_end_iteration
   3.88%  libpthread-2.28.so  [.] __pthread_mutex_lock
   3.58%  ovs-vswitchd        [.] dpcls_subtable_lookup_mf_u0w5_u1w1
   2.98%  ovs-vswitchd        [.] netdev_rxq_recv
   2.94%  ovs-vswitchd        [.] dp_netdev_input__
   2.70%  ovs-vswitchd        [.] miniflow_extract
   2.29%  ovs-vswitchd        [.] cmap_find
   1.90%  ovs-vswitchd        [.] time_timespec__
   1.79%  ovs-vswitchd        [.] conntrack_execute
   1.77%  ovs-vswitchd        [.] dpcls_subtable_lookup_mf_u0w4_u1w1
   1.38%  ovs-vswitchd        [.] fast_path_processing
   1.29%  ovs-vswitchd        [.] cmap_find_batch
   1.27%  ovs-vswitchd        [.] time_usec
   1.03%  libc-2.28.so        [.] clock_gettime@GLIBC_2.2.5
   0.97%  ovs-vswitchd        [.] conn_key_hash
   0.82%  ovs-vswitchd        [.] pmd_perf_start_iteration
   0.80%  ovs-vswitchd        [.] tcp_conn_update
   0.80%  ovs-vswitchd        [.] cmap_insert_dup
   0.76%  ovs-vswitchd        [.] conn_key_cmp
   0.74%  libc-2.28.so        [.] __libc_calloc
   0.68%  ovs-vswitchd        [.] ovs_mutex_lock_at
   0.67%  [kernel]            [k] native_queued_spin_lock_slowpath
   0.60%  ovs-vswitchd        [.] i40e_xmit_fixed_burst_vec_avx2
   0.54%  ovs-vswitchd        [.] ipf_preprocess_conntrack

DPDK: 2 cores

Flows: active 23.9K (47.6K/s) TX: 206Mb/s (331Kp/s) RX: 206Mb/s (331Kp/s) Size: ~4.5B

Config:

ovs-vsctl set open_vswitch . other_config:pmd-cpu-mask="0x14"
ovs-vsctl set Interface dpdk0 options:n_rxq=1
ovs-vsctl set Interface dpdk1 options:n_rxq=1

CPU usage:

pmd thread numa_id 0 core_id 2:
  isolated : false
  port: dpdk0             queue-id:  0 (enabled)   pmd usage: 25 %
pmd thread numa_id 0 core_id 4:
  isolated : false
  port: dpdk1             queue-id:  0 (enabled)   pmd usage: 20 %

perf top:

  19.73%  [vdso]              [.] __vdso_clock_gettime
  12.04%  ovs-vswitchd        [.] dp_netdev_process_rxq_port
   7.67%  ovs-vswitchd        [.] netdev_dpdk_rxq_recv
   7.19%  ovs-vswitchd        [.] i40e_recv_pkts_vec_avx2
   7.19%  ovs-vswitchd        [.] pmd_perf_end_iteration
   5.43%  ovs-vswitchd        [.] pmd_thread_main
   4.75%  libpthread-2.28.so  [.] __pthread_mutex_lock
   4.02%  ovs-vswitchd        [.] netdev_rxq_recv
   3.71%  libpthread-2.28.so  [.] __pthread_mutex_unlock_usercnt
   3.18%  ovs-vswitchd        [.] time_timespec__
   2.18%  ovs-vswitchd        [.] time_usec
   1.83%  ovs-vswitchd        [.] pmd_perf_start_iteration
   1.81%  libc-2.28.so        [.] clock_gettime@GLIBC_2.2.5
   1.61%  ovs-vswitchd        [.] dp_netdev_input__
   1.51%  ovs-vswitchd        [.] cmap_find
   1.34%  ovs-vswitchd        [.] dpcls_subtable_lookup_mf_u0w4_u1w1
   1.16%  ovs-vswitchd        [.] dpcls_subtable_lookup_mf_u0w5_u1w1
   0.95%  ovs-vswitchd        [.] conntrack_execute
   0.89%  ovs-vswitchd        [.] miniflow_extract
   0.73%  ovs-vswitchd        [.] ovs_mutex_lock_at
   0.68%  ovs-vswitchd        [.] tcp_conn_update
   0.66%  ovs-vswitchd        [.] fast_path_processing
   0.63%  ovs-vswitchd        [.] conn_key_hash
   0.60%  ovs-vswitchd        [.] cmap_find_batch
   0.58%  ovs-vswitchd        [.] conn_key_cmp

DPDK: 4 cores

Flows: active 28.2K (56.0K/s) TX: 243Mb/s (389Kp/s) RX: 243Mb/s (389Kp/s) Size: ~4.5B

Config:

ovs-vsctl set open_vswitch . other_config:pmd-cpu-mask="0x154"
ovs-vsctl set Interface dpdk0 options:n_rxq=2
ovs-vsctl set Interface dpdk1 options:n_rxq=2

CPU usage:

pmd thread numa_id 0 core_id 2:
  isolated : false
  port: dpdk0             queue-id:  0 (enabled)   pmd usage: 48 %
pmd thread numa_id 0 core_id 4:
  isolated : false
  port: dpdk1             queue-id:  1 (enabled)   pmd usage: 41 %
pmd thread numa_id 0 core_id 6:
  isolated : false
  port: dpdk1             queue-id:  0 (enabled)   pmd usage: 43 %
pmd thread numa_id 0 core_id 8:
  isolated : false
  port: dpdk0             queue-id:  1 (enabled)   pmd usage: 47 %

perf top:

  18.25%  [vdso]              [.] __vdso_clock_gettime
  16.53%  libpthread-2.28.so  [.] __pthread_mutex_lock
  11.29%  ovs-vswitchd        [.] dp_netdev_process_rxq_port
   7.65%  ovs-vswitchd        [.] netdev_dpdk_rxq_recv
   6.76%  ovs-vswitchd        [.] i40e_recv_pkts_vec_avx2
   6.70%  ovs-vswitchd        [.] pmd_perf_end_iteration
   5.15%  ovs-vswitchd        [.] pmd_thread_main
   3.59%  libpthread-2.28.so  [.] __pthread_mutex_unlock_usercnt
   3.06%  ovs-vswitchd        [.] time_timespec__
   2.08%  ovs-vswitchd        [.] time_usec
   1.84%  ovs-vswitchd        [.] netdev_rxq_recv
   1.71%  libc-2.28.so        [.] clock_gettime@GLIBC_2.2.5
   1.53%  ovs-vswitchd        [.] pmd_perf_start_iteration
   1.32%  ovs-vswitchd        [.] ovs_mutex_lock_at
   1.30%  ovs-vswitchd        [.] ovs_mutex_unlock
   1.00%  ovs-vswitchd        [.] dp_netdev_input__
   0.83%  ovs-vswitchd        [.] cmap_find
   0.70%  ovs-vswitchd        [.] dpcls_subtable_lookup_mf_u0w4_u1w1
   0.55%  ovs-vswitchd        [.] conntrack_execute

DPDK: 8 cores

Flows: active 31.7K (63.1K/s) TX: 275Mb/s (441Kp/s) RX: 275Mb/s (442Kp/s) Size: ~4.5B

Config:

ovs-vsctl set open_vswitch . other_config:pmd-cpu-mask="0x15554"
ovs-vsctl set Interface dpdk0 options:n_rxq=4
ovs-vsctl set Interface dpdk1 options:n_rxq=4

CPU usage:

pmd thread numa_id 0 core_id 2:
  isolated : false
  port: dpdk1             queue-id:  0 (enabled)   pmd usage: 62 %
pmd thread numa_id 0 core_id 4:
  isolated : false
  port: dpdk1             queue-id:  3 (enabled)   pmd usage: 63 %
pmd thread numa_id 0 core_id 6:
  isolated : false
  port: dpdk1             queue-id:  1 (enabled)   pmd usage: 63 %
pmd thread numa_id 0 core_id 8:
  isolated : false
  port: dpdk0             queue-id:  1 (enabled)   pmd usage: 69 %
pmd thread numa_id 0 core_id 10:
  isolated : false
  port: dpdk0             queue-id:  0 (enabled)   pmd usage: 70 %
pmd thread numa_id 0 core_id 12:
  isolated : false
  port: dpdk0             queue-id:  3 (enabled)   pmd usage: 70 %
pmd thread numa_id 0 core_id 14:
  isolated : false
  port: dpdk0             queue-id:  2 (enabled)   pmd usage: 71 %
pmd thread numa_id 0 core_id 16:
  isolated : false
  port: dpdk1             queue-id:  2 (enabled)   pmd usage: 63 %

perf top:

  28.75%  libpthread-2.28.so  [.] __pthread_mutex_lock
  13.76%  [vdso]              [.] __vdso_clock_gettime
   7.99%  ovs-vswitchd        [.] dp_netdev_process_rxq_port
   5.17%  ovs-vswitchd        [.] pmd_perf_end_iteration
   5.12%  ovs-vswitchd        [.] netdev_dpdk_rxq_recv
   4.53%  ovs-vswitchd        [.] i40e_recv_pkts_vec_avx2
   4.05%  ovs-vswitchd        [.] pmd_thread_main
   3.01%  libpthread-2.28.so  [.] __pthread_mutex_unlock_usercnt
   2.28%  ovs-vswitchd        [.] time_timespec__
   1.52%  ovs-vswitchd        [.] ovs_mutex_lock_at
   1.51%  ovs-vswitchd        [.] time_usec
   1.49%  ovs-vswitchd        [.] ovs_mutex_unlock
   1.42%  ovs-vswitchd        [.] pmd_perf_start_iteration
   1.32%  [kernel]            [k] native_queued_spin_lock_slowpath
   1.26%  libc-2.28.so        [.] clock_gettime@GLIBC_2.2.5
   1.25%  ovs-vswitchd        [.] netdev_rxq_recv
   1.20%  libpthread-2.28.so  [.] __lll_lock_wait
   1.13%  [kernel]            [k] get_futex_value_locked
   0.76%  ovs-vswitchd        [.] dp_netdev_input__
   0.57%  [kernel]            [k] __audit_syscall_entry
   0.52%  [kernel]            [k] futex_wait_setup
   0.52%  [kernel]            [k] _raw_spin_lock
   0.51%  ovs-vswitchd        [.] cmap_find

Comparison matrix

Short connections

Number of data packets per connection: 1 request&reply

Wait time between request & reply: 0ms

Individual connection data size: 40 bytes

./conntrack_ndr.py -n1 -s20 -w0 -u1

Linux: ovs 2.15 (rhel 8.4)

cores Flows CPS Bandwidth Packet Rate
1c no ct
1c/1q 44.9K 89.5K/s 387Mb/s 620Kp/s
2c/1q 57.6K 114K/s 497Mb/s 797Kp/s
4c/2q 99.4K 195K/s 843Mb/s 1.4Mp/s
8c/4q 128K 255K/s 1.1Gb/s 1.8Mp/s

DPDK: ovs 2.15 (rhel 8.4)

cores Flows CPS Bandwidth Packet Rate
1c no ct 498K 1.0M/s 4.3Gb/s 7.0Mp/s
1c/1q 23.4K 46.6K/s 202Mb/s 324Kp/s
2c/1q 23.9K 47.6K/s 206Mb/s 331Kp/s
4c/2q 28.2K 56.0K/s 243Mb/s 389Kp/s
8c/4q 31.7K 63.1K/s 275Mb/s 441Kp/s

DPDK: ovs 2.17 head

cores Flows CPS Bandwidth Packet Rate
1c no ct 498K 1.0M/s 4.3Gb/s 7.0Mp/s
1c/1q 25.1K 50.0K/s 217Mb/s 347Kp/s
2c/1q 27.6K 55.0K/s 238Mb/s 382Kp/s
4c/2q 28.1K 56.0K/s 242Mb/s 389Kp/s
8c/4q 32.7K 64.7K/s 280Mb/s 450Kp/s

DPDK: ovs 2.17 + Paolo series

cores Flows CPS Bandwidth Packet Rate Diff with 2.17
1c no ct 498K 1.0M/s 4.3Gb/s 7.0Mp/s
1c/1q 84.6K 169K/s 730Mb/s 1.2Mp/s +237%
2c/1q 130K 260K/s 1.1Gb/s 1.8Mp/s +371%
4c/2q 177K 355K/s 1.5Gb/s 2.5Mp/s +529%
8c/4q 197K 394K/s 1.7Gb/s 2.7Mp/s +500%

DPDK: ovs 2.17 + Gaetan series

cores Flows CPS Bandwidth Packet Rate Diff with 2.17
1c no ct
1c/1q 86.4K 172K/s 744Mb/s 1.2Mp/s +245%
2c/1q 100K 199K/s 867Mb/s 1.4Mp/s +262%
4c/2q 103K 204K/s 878Mb/s 1.4Mp/s +266%
8c/4q 0* 0* 0* 0*

* There is always packet loss after a while.

Longer connections

Number of data packets per connection: 200 requests&replies

Wait time between each request & reply: 500ms

Individual connection data size + duration: ~4M bytes over 100 seconds

./conntrack_ndr.py -n200 -s20 -w500 -u1

DPDK: ovs 2.17 head

cores Flows CPS Bandwidth Packet Rate
1c no ct 1.2M 11.5K/s 4.6Gb/s 6.9Mp/s
1c/1q 201K 2.0K/s 799Mb/s 1.2Mp/s
2c/1q 252K 2.5K/s 1.0Gb/s 1.5Mp/s
4c/2q 257K 2.5K/s 1.0Gb/s 1.5Mp/s
8c/4q 259K 2.6K/s 1.0Gb/s 1.6Mp/s

DPDK: ovs 2.17 + Paolo series

cores Flows CPS Bandwidth Packet Rate Diff with 2.17
1c no ct 1.2M 11.5K/s 4.6Gb/s 6.9Mp/s
1c/1q 201K 2.0K/s 798Mb/s 1.2Mp/s +0%
2c/1q 352K 3.5K/s 1.4Gb/s 2.1Mp/s +39%
4c/2q 623K 6.2K/s 2.5Gb/s 3.7Mp/s +142%
8c/4q 1.1M 10.5K/s 4.2Gb/s 6.3Mp/s +324%

DPDK: ovs 2.17 + Gaetan series

cores Flows CPS Bandwidth Packet Rate Diff with 2.17
1c no ct 1.2M 11.5K/s 4.6Gb/s 6.9Mp/s
1c/1q 181K 1.8K/s 719Mb/s 1.1Mp/s -10%
2c/1q 181K 1.8K/s 720Mb/s 1.1Mp/s -28%
4c/2q 171K 1.7K/s 680Mb/s 1.0Mp/s -33%
8c/4q 111K 1.1K/s 440Mb/s 662Kp/s -57%
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment