I am documenting how I installed zeek (bro) on my Linux machine, which has 36 cores (72 with hyperthreading), using pfring to distribute the load.
My monitoring interfaces are enp134s0f0
and enp216s0f0
.
Driver is i40e. This driver is supported by pfring, according to https://www.ntop.org/guides/pf_ring/zc.html.
zeek does not yet support OpenSSL's 1.1 API, so we need an older openssl than shipped with Ubuntu 18.0.4.1:
make -j 32
make test
make install
I used the development version of the master branch. Do not forget to use the --recursive
flag when cloning the repository.
I run ./bro -i enp134s0f0
to see if anything is captured (yes). Don't forget to ifconfig up the interface first.
This is a very common problem. I use ethtool to deactivate all offloading of checksums on the NIC.
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: on
tx-checksum-ip-generic: off [fixed]
tx-checksum-ipv6: on
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: on
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: on
tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: on
receive-hashing: on
highdma: on
rx-vlan-filter: on
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: on
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]
hw-tc-offload: off [fixed]
Deactivating offloading is trial-and-error. The ethtool-internal names of each property can usually be inferred by taking the first letters of each word, e.g. gro for generic-receive-offload. There are exceptions, e.g.:
rx rx-checksumming
tx tx-checksumming
sg scatter-gather
Deactivating the higher-level properties seems to deactivate the nested ones as well. Some commands seem to switch off other segmentation offloading features at the same time, too.
Caveats for my NIC:
- ufo cannot be changed (but was set to off automatically)
- lro cannot be changed (but was set to off automatically)
This command will do the trick:
for NIC in enp216s0f0 enp134s0f0; do
for ANNOYING in rx tx sg tso gso gro; do ethtool -K $NIC $ANNOYING off; done
done
Now test with ./bro -i enp134s0f0
.
Check conn.log
if we see full connections, i.e. the flags S...fF
.
Trying to build 7.4 as in its README: fails (good job). You need to build as described here: https://www.ntop.org/guides/pf_ring/get_started/git_installation.html
Hence, install pfring 7.4.0 from package (the github repo sees a lot of changes even to tagged versions, so you need to choose a release). Install as described here:
cd PF_RING/kernel
make
sudo make install
It needs to be a system-wide install; my attempt to use ./configure --prefix did not result in a build in a custom dir.
Try to insert the module:
insmod ./pf_ring.ko
I did not try yet to use the ZC drivers. pfring needs a custom libpcap to replace the system one:
cd PF_RING/userland/lib
./configure && make
sudo make install
cd ../libpcap
./configure && make
sudo make install
Then configure bro as described here: https://www.ntop.org/guides/pf_ring/thirdparty/bro.html
Use the following flags: ./configure --prefix=$HOME/bin/zeek --with-pcap=/usr/local/lib
I also tried with-openssl=$HOME/bin/openssl_1.2.0
, but that failed - zeek complains about an OpenSSL version <= 0.9.7. Apparently, however, there was a 1.0.2? on my system (probably from an earlier system-wide install).
The check with ldd
(and checking the timestamp of the file) yields that zeek uses the new libpcap:
ldd bro | grep pcap
Seems OK (very low):
ralph@ngara:~/bin/zeek/logs$ cat current/capture_loss.log | ../bin/bro-cut percent_lost
0.000573
0.000703
0.001729
0.000367
0.001342
0.001775
0.000962
But conn.log has this:
cat current/conn.log | ../bin/bro-cut history | sort | uniq -c | sort -rn | less
8784648 S
1353357 D
753330 Dd
752877 ^d
743630 ^hadf
718049 SAD
382076 SADF
346580 ShADadFf
200313 ShADadfF
141600 -
135312 FA
133080 ^hdaf
126887 DAF
120249 R
104406 Sr
89951 ^f
88091 ShADdaFf
76630 ^had
75983 ShADadFfR
That's too few ShADadFf
?
243258 possible_split_routing
238643 data_before_established
101396 inappropriate_FIN
These seem to point at a configuration problem in the switch.
- Check with ICT if routing is correct or not.
- Deactivate hashing? ethtool -K enp134s0f0 rh off