Skip to content

Instantly share code, notes, and snippets.

@scyto
Last active June 5, 2024 17:39
Show Gist options
  • Save scyto/67fdc9a517faefa68f730f82d7fa3570 to your computer and use it in GitHub Desktop.
Save scyto/67fdc9a517faefa68f730f82d7fa3570 to your computer and use it in GitHub Desktop.
Thunderbolt Networking Setup

Thunderbolt Networking

this gist is part of this series

NOTE FOR THIS TO BE RELIABLE ON NODE RESTARTS YOU WILL NEED PROXMOX KERNEL 6.2.16-14-pve OR HIGER

This fixes issues i bugged with the thunderbolt / thunderbolt-net maintainers (i will take everyones thanks now, lol)

Install LLDP - this is great to see what nodes can see which.

  • install lldpctl with apt install lldpd

Load Kernel Modules

  • add thunderbolt and thunderbolt-net kernel modules (this must be done all nodes - yes i know it can sometimes work withoutm but the thuderbolt-net one has interesting behaviou' so do as i say - add both ;-)
    1. nano /etc/modules add modules at bottom of file, one on each line
    2. save using x then y then enter

Prepare /etc/network/interfaces

doing this means we don't have to give each thunderbolt a manual IPv6 addrees and that these addresses stay constant no matter what Add the following to each node using nano /etc/network/interfaces

If you see any sections called thunderbolt0 or thunderbol1 delete them at this point.

Now add the following (note we will set IP addresses in the UI):

allow-hotplug en05
iface en05 inet manual
       mtu 65520

iface en05 inet6 manual
        mtu 65520

allow-hotplug en06
iface en06 inet manual
        mtu 65520

iface en06 inet6 manual
        mtu 65520

If you see any thunderbol sections delete them from the file before you save it.

Rename Thunderbolt Connections

This is needed as proxmox doesn't recognize the thunderbolt interface name. There are various methods to do this. This method was selected after trial and error because:

  • the thunderboltX naming is not fixed to a port (it seems to be based on sequence you plug the cables in)
  • the MAC address of the interfaces changes with most cable insertion and removale events
  1. use udevadm monitor command to find your device IDs when you insert and remove each TB4 cable. Yes you can use other ways to do this, i recommend this one as it is great way to understand what udev does - the command proved more useful to me than the syslog or lspci command for troublehsooting thunderbolt issues and behavious. In my case my two pci paths are 0000:00:0d.2and 0000:00:0d.3 if you bought the same hardware this will be the same on all 3 units. Don't assume your PCI device paths will be the same as mine.

  2. create a link file using nano /etc/systemd/network/00-thunderbolt0.link and enter the following content:

[Match]
Path=pci-0000:00:0d.2
Driver=thunderbolt-net
[Link]
MACAddressPolicy=none
Name=en05
  1. create a second link file using nano /etc/systemd/network/00-thunderbolt1.link and enter the following content:
[Match]
Path=pci-0000:00:0d.3
Driver=thunderbolt-net
[Link]
MACAddressPolicy=none
Name=en06

Set Interfaces to UP on reboots and cable insertions

This section en sure that the interfaces will be brought up at boot or cable insertion with whatever settings are in /etc/network/interfaces - this shouldn't need to be done, it seems like a bug in the way thunderbolt networking is handled (i assume this is debian wide but haven't checked).

  1. create a udev rule to detect for cable insertion using nano /etc/udev/rules.d/10-tb-en.rules with the following content:
ACTION=="move", SUBSYSTEM=="net", KERNEL=="en05", RUN+="/usr/local/bin/pve-en05.sh"
ACTION=="move", SUBSYSTEM=="net", KERNEL=="en06", RUN+="/usr/local/bin/pve-en06.sh"
  1. save the file

  2. create the first script referenced above using nano /usr/local/bin/pve-en05.sh and with the follwing content:

#!/bin/bash

# this brings the renamed interface up and reprocesses any settings in /etc/network/interfaces for the renamed interface
/usr/sbin/ifup en05

save the file and then

  1. create the second script referenced above using nano /usr/local/bin/pve-en06.sh and with the follwing content:
#!/bin/bash

# this brings the renamed interface up and reprocesses any settings in /etc/network/interfaces for the renamed interface
/usr/sbin/ifup en06

and save the file

  1. make both scripts executable with chmod +x /usr/local/bin/*.sh
  2. Reboot (restarting networking, init 1 and init 3 are not good enough, so reboot)

Enabling IP Connectivity

proceed to the next gist

@JamesTurland
Copy link

I have the same problem @debanyw and @markverg - same PCIe device except a minor difference in the full path. I'm unable to link with permanentMACAddress as it changes (go figure...). I'm not sure what to do.

/devices/pci0000:00/0000:00:0d.2/consumer:pci:0000:00:07.1

@Allistah
Copy link

Allistah commented May 15, 2024

I was able to get all of this networking done without much trouble. I only have a 2-node cluster right now on two NUC 13 Pro units. I'll be adding the third one in a couple months. I did all of mine using IPv4 and removed all of the IPv6 stuff. Shutting the cluster fully off and restarting node 1, then node 2 after 1 was up and running was no problem. The interfaces all came back up. Unplugging and re-plugging the cables in also recover without any trouble. I did use a fix in the comments in the next gist to get that recovery to work which is here: https://gist.github.com/scyto/4c664734535da122f4ab2951b22b2085?permalink_comment_id=5021706#gistcomment-5021706

While implementing this specific gist/section, I noticed that the PCI port addresses on mine seemed to be backwards from what this gist was saying which I thought was interesting since we both have NUCs. Node 1, Port 1, en05 had a PCI address of "pci-0000:00:0d.3" instead of .2. Node 1, Port 2, en06 had the address ending in .2. This is opposite of what is detailed in this gist. So for those of you following this gist, make sure you look at what your machine is showing you for these addresses and don't copy exactly what is shown on this page. Each machine can be different from those of the author or myself.

I made sure to use the right kernel that gives 26Gb speeds (v6.5.13-5) and I have had no problems so far. Apparently the latest kernels available as of today (May 15th, 2024) have some issues with degraded speeds. Looks like there must be a regression in the kernel somewhere. Rolling back to 6.5.13-5 works great. Not sure how we would go about reporting this to get it fixed but it would probably be a good idea. (@scyto, how do we do this?)

@uvalleza
Copy link

Just jumping onto the bandwagon to use the USB4 on my 2x Minisforum UM790 Pro and 1x Minisforum MS01. I got everything configured and working properly but I am notificing the below errors on my UM790 Pros. Anyone else encountering something similar? Is this normal?

May 22 01:48:37 pve3 kernel: usb usb6-port1: config error
May 22 01:48:39 pve3 kernel: usb usb6-port1: config error
May 22 01:48:41 pve3 pmxcfs[1075]: [status] notice: received log
May 22 01:48:43 pve3 kernel: usb usb6-port1: config error
May 22 01:48:47 pve3 kernel: usb usb6-port1: Cannot enable. Maybe the USB cable is bad?
May 22 01:48:51 pve3 kernel: usb usb6-port1: Cannot enable. Maybe the USB cable is bad?
May 22 01:48:51 pve3 kernel: usb usb6-port1: config error
May 22 01:48:55 pve3 kernel: usb usb6-port1: Cannot enable. Maybe the USB cable is bad?
May 22 01:48:55 pve3 kernel: usb usb6-port1: config error
May 22 01:48:59 pve3 kernel: usb usb6-port1: Cannot enable. Maybe the USB cable is bad?
May 22 01:48:59 pve3 kernel: usb usb6-port1: config error
May 22 01:49:03 pve3 kernel: usb usb6-port1: Cannot enable. Maybe the USB cable is bad?
May 22 01:49:03 pve3 kernel: usb usb6-port1: config error
May 22 01:49:07 pve3 kernel: usb usb6-port1: Cannot enable. Maybe the USB cable is bad?
May 22 01:49:07 pve3 kernel: usb usb6-port1: config error
May 22 01:49:11 pve3 kernel: usb usb6-port1: Cannot enable. Maybe the USB cable is bad?
May 22 01:49:11 pve3 kernel: usb usb6-port1: config error
May 22 01:49:15 pve3 kernel: usb usb6-port1: Cannot enable. Maybe the USB cable is bad?

@luilegeant
Copy link

@uvalleza I can confirm that I have the same issue with 3 UM790 Pro from minisforum.
I've tried Ubuntu server 22.04.4 LTS (kernel 5.15.0-107-generic); Ubuntu server 24.04 LTS (kernel 6.8.0-31-generic).
I also know that harassing the links (un-plug + re-plug) sometimes makes the connexion work => which led me in the last 3 days of 2 cases of full 3 nodes working mesh.
I haven't figured out the conditions to reproduce/fix reliably the situation.

@uvalleza
Copy link

uvalleza commented May 23, 2024

@luilegeant Seems like something normal then but as of yesterday those notifications have gone away for me after doing the fix in the below comment. But the thing for me is that my connection was never broken. However, I did encounter that in the beginning but once I ran systemctl restart frr.service / unplug and replugged, everything worked. At this point, I am pointing post-up /usr/bin/systemctl restart frr.service from /etc/network/interfaces was not running successfully but unsure.

https://gist.github.com/scyto/4c664734535da122f4ab2951b22b2085?permalink_comment_id=5021706#gistcomment-5021706

Oh and last thing, I did have to go back and fix the order of how everything was connected. I followed the below.

using the numbers printed on the case of the intel13 nucs connect cables as follows (this is important):

  • node 1 port 1 > node 2 port 2
  • node 2 port 1 > node 3 port 2
  • node 3 port 1 > node 1 port 2

My current errors are now the below.... Would like to see if the above works for you and if you start reporting similar issues like me.

May 23 06:34:10 pve fabricd[1074]: [QBAZ6-3YZR3] OpenFabric: Could not find two T0 routers
May 23 06:41:12 pve fabricd[1074]: [QBAZ6-3YZR3] OpenFabric: Could not find two T0 routers
May 23 06:44:55 pve systemd[1]: Starting apt-daily-upgrade.service - Daily apt upgrade and clean activities...
May 23 06:44:55 pve systemd[1]: apt-daily-upgrade.service: Deactivated successfully.
May 23 06:44:55 pve systemd[1]: Finished apt-daily-upgrade.service - Daily apt upgrade and clean activities.
May 23 06:47:55 pve fabricd[1074]: [NBV6R-CM3PT] OpenFabric: Needed to resync LSPDB using CSNP!
May 23 06:48:55 pve fabricd[1074]: [QBAZ6-3YZR3] OpenFabric: Could not find two T0 routers
May 23 06:55:37 pve fabricd[1074]: [QBAZ6-3YZR3] OpenFabric: Could not find two T0 routers
May 23 07:01:47 pve pmxcfs[1196]: [dcdb] notice: data verification successful
May 23 07:02:20 pve fabricd[1074]: [NBV6R-CM3PT] OpenFabric: Needed to resync LSPDB using CSNP!
May 23 07:03:20 pve fabricd[1074]: [QBAZ6-3YZR3] OpenFabric: Could not find two T0 routers
May 23 07:09:53 pve fabricd[1074]: [QBAZ6-3YZR3] OpenFabric: Could not find two T0 routers
May 23 07:16:36 pve fabricd[1074]: [NBV6R-CM3PT] OpenFabric: Needed to resync LSPDB using CSNP!
May 23 07:17:01 pve CRON[480274]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 23 07:17:01 pve CRON[480275]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May 23 07:17:01 pve CRON[480274]: pam_unix(cron:session): session closed for user root
May 23 07:17:36 pve fabricd[1074]: [QBAZ6-3YZR3] OpenFabric: Could not find two T0 routers
May 23 07:24:17 pve fabricd[1074]: [QBAZ6-3YZR3] OpenFabric: Could not find two T0 routers
May 23 07:30:51 pve fabricd[1074]: [NBV6R-CM3PT] OpenFabric: Needed to resync LSPDB using CSNP!
May 23 07:31:51 pve fabricd[1074]: [QBAZ6-3YZR3] OpenFabric: Could not find two T0 routers
May 23 07:38:38 pve fabricd[1074]: [QBAZ6-3YZR3] OpenFabric: Could not find two T0 routers

Just out of curiosity are you getting similar speeds like the below? I notice multiple people with MS-01's are getting higher so i am wondering if it's a limitation on the USB4 of the UM790

root@pve2:~# iperf3 -c 10.0.0.81 -B 10.0.0.82
Connecting to host 10.0.0.81, port 5201
[ 5] local 10.0.0.82 port 58585 connected to 10.0.0.81 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 1.39 GBytes 12.0 Gbits/sec 12 2.37 MBytes
[ 5] 1.00-2.00 sec 1.40 GBytes 12.0 Gbits/sec 0 2.50 MBytes
[ 5] 2.00-3.00 sec 1.39 GBytes 12.0 Gbits/sec 0 2.50 MBytes
[ 5] 3.00-4.00 sec 1.39 GBytes 11.9 Gbits/sec 0 2.50 MBytes
[ 5] 4.00-5.00 sec 1.39 GBytes 12.0 Gbits/sec 0 2.50 MBytes
[ 5] 5.00-6.00 sec 1.39 GBytes 12.0 Gbits/sec 0 2.50 MBytes
[ 5] 6.00-7.00 sec 1.39 GBytes 11.9 Gbits/sec 0 2.50 MBytes
[ 5] 7.00-8.00 sec 1.39 GBytes 11.9 Gbits/sec 0 2.50 MBytes
[ 5] 8.00-9.00 sec 1.39 GBytes 11.9 Gbits/sec 0 2.50 MBytes
[ 5] 9.00-10.00 sec 1.39 GBytes 12.0 Gbits/sec 0 2.50 MBytes


[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 13.9 GBytes 12.0 Gbits/sec 12 sender
[ 5] 0.00-10.00 sec 13.9 GBytes 12.0 Gbits/sec receiver

@damitjimii
Copy link

damitjimii commented Jun 1, 2024

Hi peeps, @markverg @scyto
re the pci short name being the same;
I think this may be due to having a thunderbolt bridge;
See the output below contrasting the MS-01 and the ASUS z690-i
MS-01:

root@pve01:~# lspci
00:00.0 Host bridge: Intel Corporation Device a706
00:01.0 PCI bridge: Intel Corporation Device a70d
00:02.0 VGA compatible controller: Intel Corporation Raptor Lake-P [Iris Xe Graphics] (rev 04)
00:06.0 PCI bridge: Intel Corporation Raptor Lake PCIe 4.0 Graphics Port
00:06.2 PCI bridge: Intel Corporation Device a73d
00:07.0 PCI bridge: Intel Corporation Raptor Lake-P Thunderbolt 4 PCI Express Root Port
00:07.2 PCI bridge: Intel Corporation Raptor Lake-P Thunderbolt 4 PCI Express Root Port
00:0d.0 USB controller: Intel Corporation Raptor Lake-P Thunderbolt 4 USB Controller
00:0d.2 USB controller: Intel Corporation Raptor Lake-P Thunderbolt 4 NHI
00:0d.3 USB controller: Intel Corporation Raptor Lake-P Thunderbolt 4 NHI
00:14.0 USB controller: Intel Corporation Alder Lake PCH USB 3.2 xHCI Host Controller (rev 01)
00:14.2 RAM memory: Intel Corporation Alder Lake PCH Shared SRAM (rev 01)
00:16.0 Communication controller: Intel Corporation Alder Lake PCH HECI Controller (rev 01)
00:16.3 Serial controller: Intel Corporation Alder Lake AMT SOL Redirection (rev 01)
00:1c.0 PCI bridge: Intel Corporation Alder Lake-P PCH PCIe Root Port (rev 01)
00:1c.4 PCI bridge: Intel Corporation Device 51bc (rev 01)
00:1d.0 PCI bridge: Intel Corporation Device 51b2 (rev 01)
00:1d.3 PCI bridge: Intel Corporation Device 51b3 (rev 01)
00:1f.0 ISA bridge: Intel Corporation Raptor Lake LPC/eSPI Controller (rev 01)
00:1f.3 Audio device: Intel Corporation Raptor Lake-P/U/H cAVS (rev 01)
00:1f.4 SMBus: Intel Corporation Alder Lake PCH-P SMBus Host Controller (rev 01)
00:1f.5 Serial bus controller: Intel Corporation Alder Lake-P PCH SPI Controller (rev 01)
01:00.0 PCI bridge: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch
02:03.0 PCI bridge: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch
02:07.0 PCI bridge: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch
03:00.0 System peripheral: Global Unichip Corp. Coral Edge TPU
04:00.0 System peripheral: Global Unichip Corp. Coral Edge TPU
05:00.0 Non-Volatile memory controller: MAXIO Technology (Hangzhou) Ltd. NVMe SSD Controller MAP120 2 (rev 01)
06:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
06:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
5b:00.0 Ethernet controller: Intel Corporation Ethernet Controller I226-V (rev 04)
5c:00.0 Non-Volatile memory controller: MAXIO Technology (Hangzhou) Ltd. NVMe SSD Controller MAP120 2 (rev 01)
5d:00.0 Ethernet controller: Intel Corporation Ethernet Controller I226-LM (rev 04)
5e:00.0 System peripheral: Global Unichip Corp. Coral Edge TPU

Asus z690-i (i5 cpu):
root@pve07:# lspci
00:00.0 Host bridge: Intel Corporation Device 4648 (rev 02)
00:01.0 PCI bridge: Intel Corporation 12th Gen Core Processor PCI Express x16 Controller #1 (rev 02)
00:02.0 VGA compatible controller: Intel Corporation AlderLake-S GT1 (rev 0c)
00:0a.0 Signal processing controller: Intel Corporation Platform Monitoring Technology (rev 01)
00:0e.0 RAID bus controller: Intel Corporation Volume Management Device NVMe RAID Controller
00:14.0 USB controller: Intel Corporation Alder Lake-S PCH USB 3.2 Gen 2x2 XHCI Controller (rev 11)
00:14.2 RAM memory: Intel Corporation Alder Lake-S PCH Shared SRAM (rev 11)
00:14.3 Network controller: Intel Corporation Alder Lake-S PCH CNVi WiFi (rev 11)
00:15.0 Serial bus controller: Intel Corporation Alder Lake-S PCH Serial IO I2C Controller #0 (rev 11)
00:15.1 Serial bus controller: Intel Corporation Alder Lake-S PCH Serial IO I2C Controller #1 (rev 11)
00:15.2 Serial bus controller: Intel Corporation Alder Lake-S PCH Serial IO I2C Controller #2 (rev 11)
00:16.0 Communication controller: Intel Corporation Alder Lake-S PCH HECI Controller #1 (rev 11)
00:17.0 SATA controller: Intel Corporation Alder Lake-S PCH SATA Controller [AHCI Mode] (rev 11)
00:1a.0 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #25 (rev 11)
00:1b.0 PCI bridge: Intel Corporation Device 7ac0 (rev 11)
00:1c.0 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #1 (rev 11)
00:1c.1 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #2 (rev 11)
00:1c.4 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #5 (rev 11)
00:1d.0 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #9 (rev 11)
00:1d.4 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #13 (rev 11)
00:1f.0 ISA bridge: Intel Corporation Z690 Chipset LPC/eSPI Controller (rev 11)
00:1f.4 SMBus: Intel Corporation Alder Lake-S PCH SMBus Controller (rev 11)
00:1f.5 Serial bus controller: Intel Corporation Alder Lake-S PCH SPI Controller (rev 11)
01:00.0 VGA compatible controller: NVIDIA Corporation GP106 [GeForce GTX 1060 6GB] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GP106 High Definition Audio Controller (rev a1)
05:00.0 Ethernet controller: Intel Corporation Ethernet Controller I225-V (rev 03)
06:00.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] (rev 02)
07:00.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] (rev 02)
07:01.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] (rev 02)
07:02.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] (rev 02)
07:03.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] (rev 02)
08:00.0 USB controller: Intel Corporation Thunderbolt 4 NHI [Maple Ridge 4C 2020]
3d:00.0 USB controller: Intel Corporation Thunderbolt 4 USB Controller [Maple Ridge 4C 2020]
71:00.0 Non-Volatile memory controller: INNOGRIT Corporation NVMe SSD Controller IG5236 (rev 01)
root@pve07:~# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr0 state UP mode DEFAULT group default qlen 1000
link/ether e8:9c:25:79:c1:f6 brd ff:ff:ff:ff:ff:ff
5: wlo1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether c8:5e:a9:75:df:c8 brd ff:ff:ff:ff:ff:ff
altname wlp0s20f3
6: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether e8:9c:25:79:c1:f6 brd ff:ff:ff:ff:ff:ff
9: thunderbolt0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 12:e6:22:5c:20:3e brd ff:ff:ff:ff:ff:ff
10: thunderbolt1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 02:e6:22:5c:20:3e brd ff:ff:ff:ff:ff:ff

I will attempt to use the MAC like Mark, thanks for the work, hope this info helps.
Clay

@damitjimii
Copy link

Hi peeps; for anyone like me with a non-unique PCI device, you can use the MAC address to match and the easiest place to get it is "ip link show"
As well, don't use quotes around the address; see below for example:

[Match]
MACAddress=aa:bb:cc:dd:ee:ff
Driver=thunderbolt-net
[Link]
MACAddressPolicy=none
Name=en06

Great thread peeps.
Clay.

@JamesTurland
Copy link

Hi peeps; for anyone like me with a non-unique PCI device, you can use the MAC address to match and the easiest place to get it is "ip link show" As well, don't use quotes around the address; see below for example:

[Match] MACAddress=aa:bb:cc:dd:ee:ff Driver=thunderbolt-net [Link] MACAddressPolicy=none Name=en06

Great thread peeps. Clay.

Unfortunately my MAC and permanent MAC change on each boot... Go figure.

@damitjimii
Copy link

@JamesTurland maybe the whole path? Or maybe look at the udevadm info -ap /whole pci path + /net/thunderbold0 to see if there is something static in one of the path/device levels.

@scyto
Copy link
Author

scyto commented Jun 3, 2024

Unfortunately my MAC and permanent MAC change on each boot... Go figure.

yes thats what I observed too, also the thunderbolt0 and thunderbolt1 names are not consistent, whichever port comes up first with cable will get 0 and the next port gets 1 - so this can change based on order cables are plugged in or whatever weird race condition on the bus / kernel happens.....

@scyto
Copy link
Author

scyto commented Jun 3, 2024

@damitjimii forget the dmesg output it will lead you astray - what do you see in the udevadm tool for each port - thats the key because you are creating match conditions based on the udev meta data for those paths. By querying the various paths you see in the monitor output with udevadm to explicitly inspect the paths you can try and find other consistent identifiers to do match on.

@luilegeant
Copy link

luilegeant commented Jun 4, 2024

@uvalleza & all: Quick summary from my last few days using 3 um790 pro from minisforum:

  • Context: Ubuntu 24.04 LTS on 3 "nuc" with AMD cpu. (I was told: usb-4 doesn't necessary means thunderbolt-3, the spec is a pick & choose; on top of that until recently thunderbolt was a intel only feature)
  • The speed i get on direct link is around ~12Gbits like you do
  • The frr needs a "reload" after boot (no need to play with the wires, unless the logs shows invalid config for usb port x) see code snipped bellow
  • Downgrading the bios/uefi from 1.09 to 1.07 gave me less "invalid config for usb x" kind of errors and more reliability after reboots (i still have to unplug-replug some wires sometimes) also, it seems that my speed went from 10-11gbits to 12-13 gbits, but i can't really confirm. => what bios/uefi version are you running with ?
  • I had to stop using encrypted boot drives as it required me to unplug the thunderbolt links to let the hdmi work, then replug it all.
  • cables are indeed placed the same way you do
  • my frr setup seems to be working, but once I remove 1 of the link, the speed is about 2Mbits (yes mega bits) when it needs to do 1 more hop through the 2 other thunderbolt links => i have yet to figure out that part

To auto-reload the frr configuration after reboot (required otherwise it fails to see the thunderbolt links and I get 3 independent nodes that don't see each other via vtysh -c "show openfabric topology")
Requirement: have your interfaces renamed (see "tbt" in script) as explained in the first post by scyto (don't use hyphen in interface names, it wasn't working for me)

#!/bin/sh
# Delayed start script to tell frr to reload ensuring that it sees thunderbolt links towards other nodes.
# condition: is there any tbt network interface and frr service up
COUNTER=0
while [ ${COUNTER} -lt 5 ]; do
	sleep 1;
	TEST=$(ip a | grep ": tbt" | grep "UP" | awk 'BEGIN { ORS=""}; {print $2}')
	if [ ${#TEST} -ge 2 ]; then
		TEST_SVC=$(service frr status | grep "active (running)")
		if [ ${#TEST_SVC} -ge 2 ]; then
			service frr reload;
			echo "frr service reload request sent"
			exit 0;
		fi
	fi
	COUNTER=$((COUNTER+1));
done
echo "Failed to request frr service reload: request NOT sent"
exit 1;
[Unit]
After=network.target

[Service]
ExecStart=/usr/local/bin/restart-frr.sh

[Install]
WantedBy=default.target

Note: The script is called restart, but after some testing, I realised that reload was enough.

To all: thank you for sharing your experience, its a great help & motivation to figure out what's going sideways 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment