Skip to content

Instantly share code, notes, and snippets.

@scyto
Last active April 22, 2024 03:54
Show Gist options
  • Star 21 You must be signed in to star a gist
  • Fork 6 You must be signed in to fork a gist
  • Save scyto/67fdc9a517faefa68f730f82d7fa3570 to your computer and use it in GitHub Desktop.
Save scyto/67fdc9a517faefa68f730f82d7fa3570 to your computer and use it in GitHub Desktop.
Thunderbolt Networking Setup

Thunderbolt Networking

this gist is part of this series

NOTE FOR THIS TO BE RELIABLE ON NODE RESTARTS YOU WILL NEED PROXMOX KERNEL 6.2.16-14-pve OR HIGER

This fixes issues i bugged with the thunderbolt / thunderbolt-net maintainers (i will take everyones thanks now, lol)

Install LLDP - this is great to see what nodes can see which.

  • install lldpctl with apt install lldpd

Load Kernel Modules

  • add thunderbolt and thunderbolt-net kernel modules (this must be done all nodes - yes i know it can sometimes work withoutm but the thuderbolt-net one has interesting behaviou' so do as i say - add both ;-)
    1. nano /etc/modules add modules at bottom of file, one on each line
    2. save using x then y then enter

Prepare /etc/network/interfaces

doing this means we don't have to give each thunderbolt a manual IPv6 addrees and that these addresses stay constant no matter what Add the following to each node using nano /etc/network/interfaces

If you see any sections called thunderbolt0 or thunderbol1 delete them at this point.

Now add the following (note we will set IP addresses in the UI):

allow-hotplug en05
iface en05 inet manual
       mtu 65520

iface en05 inet6 manual
        mtu 65520

allow-hotplug en06
iface en06 inet manual
        mtu 65520

iface en06 inet6 manual
        mtu 65520

If you see any thunderbol sections delete them from the file before you save it.

Rename Thunderbolt Connections

This is needed as proxmox doesn't recognize the thunderbolt interface name. There are various methods to do this. This method was selected after trial and error because:

  • the thunderboltX naming is not fixed to a port (it seems to be based on sequence you plug the cables in)
  • the MAC address of the interfaces changes with most cable insertion and removale events
  1. use udevadm monitor command to find your device IDs when you insert and remove each TB4 cable. Yes you can use other ways to do this, i recommend this one as it is great way to understand what udev does - the command proved more useful to me than the syslog or lspci command for troublehsooting thunderbolt issues and behavious. In my case my two pci paths are 0000:00:0d.2and 0000:00:0d.3 if you bought the same hardware this will be the same on all 3 units. Don't assume your PCI device paths will be the same as mine.

  2. create a link file using nano /etc/systemd/network/00-thunderbolt0.link and enter the following content:

[Match]
Path=pci-0000:00:0d.2
Driver=thunderbolt-net
[Link]
MACAddressPolicy=none
Name=en05
  1. create a second link file using nano /etc/systemd/network/00-thunderbolt1.link and enter the following content:
[Match]
Path=pci-0000:00:0d.3
Driver=thunderbolt-net
[Link]
MACAddressPolicy=none
Name=en06

Set Interfaces to UP on reboots and cable insertions

This section en sure that the interfaces will be brought up at boot or cable insertion with whatever settings are in /etc/network/interfaces - this shouldn't need to be done, it seems like a bug in the way thunderbolt networking is handled (i assume this is debian wide but haven't checked).

  1. create a udev rule to detect for cable insertion using nano /etc/udev/rules.d/10-tb-en.rules with the following content:
ACTION=="move", SUBSYSTEM=="net", KERNEL=="en05", RUN+="/usr/local/bin/pve-en05.sh"
ACTION=="move", SUBSYSTEM=="net", KERNEL=="en06", RUN+="/usr/local/bin/pve-en06.sh"
  1. save the file

  2. create the first script referenced above using nano /usr/local/bin/pve-en05.sh and with the follwing content:

#!/bin/bash

# this brings the renamed interface up and reprocesses any settings in /etc/network/interfaces for the renamed interface
/usr/sbin/ifup en05

save the file and then

  1. create the second script referenced above using nano /usr/local/bin/pve-en06.sh and with the follwing content:
#!/bin/bash

# this brings the renamed interface up and reprocesses any settings in /etc/network/interfaces for the renamed interface
/usr/sbin/ifup en06

and save the file

  1. make both scripts executable with chmod +x /usr/local/bin/*.sh
  2. Reboot (restarting networking, init 1 and init 3 are not good enough, so reboot)

Enabling IP Connectivity

proceed to the next gist

@scyto
Copy link
Author

scyto commented Oct 16, 2023

as connecting cables must be in the same subnet, did you do it like this?

and to be clear each port pair need to be in a unique subnet and if you have 3 nodes you will still need to implement routing between those 3 port pair subnets

personally i think implementing FRR as i link ed is much easier and better in the long term

@scyto
Copy link
Author

scyto commented Oct 16, 2023

for reference you can see in the way i do it - no need for an IP on the thunderbolt interfaces at all (en05 / en06)

theoretically this also means i don't really need the link files any more as static names wouldn't be needed... i just never went back and unwound the link files as I prefer the predictability of a port being the same thing every time :-)

image

@markverg
Copy link

That’s highly appreciated! I will have another look and try tomorrow

@mzinner
Copy link

mzinner commented Oct 24, 2023

Hi @scyto,

would you mind removing the section Set IP addresses via UI from above? Setting IP addresses in the UI for en05 and en06 seems to conflict with the OpenFabric routing and caused me days of trying to figuring out why things where only partially/randomly working. Only when I saw your Proxmox network screenshot above that has cleared CIDR entries for en05 and en06 I realized I need to remove the old entries I had from your OSPF Routing On Mesh network approach that is now deprecated.

Thanks,
MikeZ

@scyto
Copy link
Author

scyto commented Oct 24, 2023

@mzinner great catch, sorry that was left over from older version

@AdriSchmi
Copy link

AdriSchmi commented Dec 14, 2023

I have the 3 NPB7 setup and in a cluster but the thunderbolt connections dont work. When i plug them together i dont see any events in udevadm monitor. I added the modules thunderbolt-net and thunderbolt. I am new to linux and need some guidance to troubleshoot this issue.

Ok i am a little bit further. When i plug the minisforum into my laptop with usb4 it shows the events in udevadm monitor but not when i plug it in the other minisforum pc.

@scyto
Copy link
Author

scyto commented Dec 19, 2023

@AdriSchmi

that usually means the driver isn't loading or somethings wrong - you should also see items in /var/log/messages

  1. are you sure you are TB cables (and not generic USBC cables)
  2. i don't see any evidence it actually has TB - the ports are marked USB4 - i don't know what this means in terms of how you get USB4's xdomain networking working sorry - it should be possible... i do know there were LOTS of xdomain fixes that went into the latest linux kernel tree - i would need to go look at the linux kernel commits to know if they have made their way into the kernel version proxmox uses

i know this is no help now, but in future i recommend always pay the extra for true certified TB4 with USB4 so much of the spec is optional....

Yup i checked their product page - its USB4 not TB - the difference does matter, i note several vendors have been loose implying USB4 is TB4 - it isn't.

@AdriSchmi
Copy link

@scyto but it works flawless when i plug in a thunderbolt 4 laptop. I see the scripts load and the interfaces come up so i think its only a linux software problem. I tried 3 drifferent Thunderbolt cabel all work between my laptop and the NPB7 but not between the NPB7 and NPB7.

@AdriSchmi
Copy link

@scyto how do i see items in /var/log/messages. The file doesnt exist.

@scyto
Copy link
Author

scyto commented Dec 20, 2023

sorry, thought everyone using promox knew all the message logs are in the system journal

you can see all kernel entries for thunderbolt with something like this journalctl -k | grep thunder the -k shows kernel messages and the grep filters to just those ones with thunder in the line

if you are used to use /var/log having different logs in it, learn to love journalctl command as this seems to be the way many distros are going over time....

you may need to use USB or USB4 instead of thunder (or just don't filter the kernel messages at all to see if anything important is in there)

@scyto
Copy link
Author

scyto commented Dec 20, 2023

here is an example output on my node

root@pve1:~# journalctl -k | grep thunder
Nov 26 19:33:18 pve1 kernel: ACPI: bus type thunderbolt registered
Nov 26 19:33:18 pve1 kernel: thunderbolt 0-0:1.1: new retimer found, vendor=0x8087 device=0x15ee
Nov 26 19:33:18 pve1 kernel: thunderbolt 1-0:1.1: new retimer found, vendor=0x8087 device=0x15ee
Nov 26 19:33:21 pve1 kernel: thunderbolt 0-1: new host found, vendor=0x8086 device=0x1
Nov 26 19:33:21 pve1 kernel: thunderbolt 0-1: Intel Corp. pve3
Nov 26 19:33:21 pve1 kernel: thunderbolt-net 0-1.0 en05: renamed from thunderbolt0
Nov 26 19:33:22 pve1 kernel: thunderbolt 1-1: new host found, vendor=0x8086 device=0x1
Nov 26 19:33:22 pve1 kernel: thunderbolt 1-1: Intel Corp. pve2
Nov 26 19:33:22 pve1 kernel: thunderbolt-net 1-1.0 en06: renamed from thunderbolt0
Nov 26 19:50:07 pve1 kernel: thunderbolt 1-0:1.1: retimer disconnected
Nov 26 19:50:07 pve1 kernel: thunderbolt 1-1: host disconnected
Nov 26 19:50:11 pve1 kernel: thunderbolt 1-0:1.1: new retimer found, vendor=0x8087 device=0x15ee
Nov 26 19:50:18 pve1 kernel: thunderbolt 1-0:1.1: retimer disconnected
Nov 26 19:50:26 pve1 kernel: thunderbolt 1-0:1.1: new retimer found, vendor=0x8087 device=0x15ee
Nov 26 19:54:23 pve1 kernel: thunderbolt 1-0:1.1: retimer disconnected
Nov 26 19:54:31 pve1 kernel: thunderbolt 1-0:1.1: new retimer found, vendor=0x8087 device=0x15ee
Nov 26 19:54:49 pve1 kernel: thunderbolt 1-1: new host found, vendor=0x8086 device=0x1
Nov 26 19:54:49 pve1 kernel: thunderbolt 1-1: Intel Corp. pve2
Nov 26 19:54:49 pve1 kernel: thunderbolt-net 1-1.0 en06: renamed from thunderbolt0
Nov 26 19:59:58 pve1 kernel: thunderbolt-net 1-1.0 en06: ThunderboltIP login timed out
Nov 26 20:08:41 pve1 kernel: thunderbolt 1-0:1.1: retimer disconnected
Nov 26 20:08:41 pve1 kernel: thunderbolt 1-1: host disconnected
Nov 26 20:08:46 pve1 kernel: thunderbolt 1-0:1.1: new retimer found, vendor=0x8087 device=0x15ee
Nov 26 20:09:11 pve1 kernel: thunderbolt 1-1: new host found, vendor=0x8086 device=0x1
Nov 26 20:09:11 pve1 kernel: thunderbolt 1-1: Intel Corp. pve2
Nov 26 20:09:11 pve1 kernel: thunderbolt-net 1-1.0 en06: renamed from thunderbolt0
Nov 26 20:14:20 pve1 kernel: thunderbolt-net 1-1.0 en06: ThunderboltIP login timed out
Nov 26 20:17:46 pve1 kernel: thunderbolt 0-0:1.1: retimer disconnected
Nov 26 20:17:46 pve1 kernel: thunderbolt 0-1: host disconnected
Nov 26 20:17:50 pve1 kernel: thunderbolt 0-0:1.1: new retimer found, vendor=0x8087 device=0x15ee
Nov 26 20:18:19 pve1 kernel: thunderbolt 0-1: new host found, vendor=0x8086 device=0x1
Nov 26 20:18:19 pve1 kernel: thunderbolt 0-1: Intel Corp. pve3
Nov 26 20:18:19 pve1 kernel: thunderbolt-net 0-1.0 en05: renamed from thunderbolt0
Nov 26 20:23:28 pve1 kernel: thunderbolt-net 0-1.0 en05: ThunderboltIP login timed out

@scyto
Copy link
Author

scyto commented Dec 20, 2023

@scyto but it works flawless when i plug in a thunderbolt 4 laptop. I see the scripts load and the interfaces come up so i think its only a linux software problem. I tried 3 drifferent Thunderbolt cabel all work between my laptop and the NPB7 but not between the NPB7 and NPB7.

quite possibly (not)wokring as expected, as i told you the USB4 support is still very much in flux in upstream kernels, you may need to just wait until one of those later kernels is the one proxmox supports

one way to test is use a very upto date (aka one with later linuxkernel than proxmox) debian or ubunt distro and see if that works - if it doesn't then your next step after that would be to compile your own kernel from source and test

and if that doesn't work then you would need to file an extremely high quality bug with the kernel maintainers

---edit---
when you plug the laptop in take note of the udev and kernel logs and see what kernel module is actually loaded (it may not be thunderbolt (and you may have to manually add that module(s) to the module file

@AdriSchmi
Copy link

there are no logs in journalctl when i plug NBP7 in NPB7.
this are the logs when i plug in my laptop:

root@nuc1:~# journalctl -k | grep thunder
Dec 21 17:20:38 nuc1 kernel: ACPI: bus type thunderbolt registered
Dec 21 17:21:11 nuc1 kernel: WARNING: CPU: 4 PID: 325 at drivers/thunderbolt/ctl.c:217 check_config_address+0xb9/0xd0 [thunderbolt]
Dec 21 17:21:11 nuc1 kernel: mac80211 btbcm drm_display_helper snd_hwdep aesni_intel btintel cec snd_pcm crypto_simd processor_thermal_device_pci btmtk rc_core processor_thermal_device mei_hdcp mei_pxp snd_timer cryptd processor_thermal_rfim cfg80211 bluetooth drm_kms_helper snd processor_thermal_mbox cmdlinepart mei_me rapl intel_rapl_msr ecdh_generic processor_thermal_rapl spi_nor libarc4 soundcore i2c_algo_bit intel_cstate ecc int3400_thermal mei intel_rapl_common wmi_bmof pcspkr mtd zfs(PO) acpi_thermal_rel int340x_thermal_zone acpi_tad acpi_pad mac_hid spl(O) vhost_net vhost vhost_iotlb tap thunderbolt_net drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq simplefb dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c nvme xhci_pci intel_lpss_pci xhci_pci_renesas i2c_i801 nvme_core spi_intel_pci ahci intel_lpss video crc32_pclmul xhci_hcd igc thunderbolt i2c_smbus nvme_common spi_intel libahci idma64 wmi pinctrl_tigerlake
Dec 21 17:21:11 nuc1 kernel: Workqueue: thunderbolt0 tb_handle_hotplug [thunderbolt]
Dec 21 17:21:11 nuc1 kernel: RIP: 0010:check_config_address+0xb9/0xd0 [thunderbolt]
Dec 21 17:21:11 nuc1 kernel: ? check_config_address+0xb9/0xd0 [thunderbolt]
Dec 21 17:21:11 nuc1 kernel: ? check_config_address+0xb9/0xd0 [thunderbolt]
Dec 21 17:21:11 nuc1 kernel: ? check_config_address+0xb9/0xd0 [thunderbolt]
Dec 21 17:21:11 nuc1 kernel: tb_cfg_read_raw+0x280/0x2f0 [thunderbolt]
Dec 21 17:21:11 nuc1 kernel: tb_cfg_read+0x54/0x120 [thunderbolt]
Dec 21 17:21:11 nuc1 kernel: __tb_port_enable+0xe3/0x200 [thunderbolt]
Dec 21 17:21:11 nuc1 kernel: tb_port_enable+0x13/0x20 [thunderbolt]
Dec 21 17:21:11 nuc1 kernel: tb_handle_hotplug+0x434/0x960 [thunderbolt]
Dec 21 17:21:11 nuc1 kernel: WARNING: CPU: 4 PID: 325 at drivers/thunderbolt/ctl.c:1090 tb_cfg_read+0x7b/0x120 [thunderbolt]
Dec 21 17:21:11 nuc1 kernel: mac80211 btbcm drm_display_helper snd_hwdep aesni_intel btintel cec snd_pcm crypto_simd processor_thermal_device_pci btmtk rc_core processor_thermal_device mei_hdcp mei_pxp snd_timer cryptd processor_thermal_rfim cfg80211 bluetooth drm_kms_helper snd processor_thermal_mbox cmdlinepart mei_me rapl intel_rapl_msr ecdh_generic processor_thermal_rapl spi_nor libarc4 soundcore i2c_algo_bit intel_cstate ecc int3400_thermal mei intel_rapl_common wmi_bmof pcspkr mtd zfs(PO) acpi_thermal_rel int340x_thermal_zone acpi_tad acpi_pad mac_hid spl(O) vhost_net vhost vhost_iotlb tap thunderbolt_net drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq simplefb dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c nvme xhci_pci intel_lpss_pci xhci_pci_renesas i2c_i801 nvme_core spi_intel_pci ahci intel_lpss video crc32_pclmul xhci_hcd igc thunderbolt i2c_smbus nvme_common spi_intel libahci idma64 wmi pinctrl_tigerlake
Dec 21 17:21:11 nuc1 kernel: Workqueue: thunderbolt0 tb_handle_hotplug [thunderbolt]
Dec 21 17:21:11 nuc1 kernel: RIP: 0010:tb_cfg_read+0x7b/0x120 [thunderbolt]
Dec 21 17:21:11 nuc1 kernel: ? tb_cfg_read+0x7b/0x120 [thunderbolt]
Dec 21 17:21:11 nuc1 kernel: ? tb_cfg_read+0x7b/0x120 [thunderbolt]
Dec 21 17:21:11 nuc1 kernel: __tb_port_enable+0xe3/0x200 [thunderbolt]
Dec 21 17:21:11 nuc1 kernel: tb_port_enable+0x13/0x20 [thunderbolt]
Dec 21 17:21:11 nuc1 kernel: tb_handle_hotplug+0x434/0x960 [thunderbolt]
Dec 21 17:21:47 nuc1 kernel: thunderbolt 0-0:1.1: new retimer found, vendor=0x8087 device=0x15ee
Dec 21 17:21:48 nuc1 kernel: thunderbolt 0-0:1.1: retimer disconnected
Dec 21 17:25:04 nuc1 kernel: thunderbolt 0-0:1.1: new retimer found, vendor=0x8087 device=0x15ee
Dec 21 17:25:17 nuc1 kernel: thunderbolt 0-1: new host found, vendor=0x1 device=0x1
Dec 21 17:25:17 nuc1 kernel: thunderbolt 0-1: Intel Corp. ADRIAN-SCHMIDBE
Dec 21 17:25:17 nuc1 kernel: thunderbolt-net 0-1.0 en05: renamed from thunderbolt0
Dec 21 17:25:56 nuc1 kernel: thunderbolt 0-0:1.1: retimer disconnected
Dec 21 17:25:56 nuc1 kernel: thunderbolt 0-1: host disconnected

and the udevadm monitor events
KERNEL[269.086968] add /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1 (thunderbolt)
UDEV [269.089961] add /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1 (thunderbolt)
KERNEL[269.097503] add /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_active0 (nvmem)
KERNEL[269.097518] add /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_non_active0 (nvmem)
UDEV [269.097849] add /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_active0 (nvmem)
UDEV [269.098210] add /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_non_active0 (nvmem)
KERNEL[281.597543] change /0-1 (thunderbolt)
KERNEL[281.598342] add /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1 (thunderbolt)
KERNEL[281.598355] add /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0 (thunderbolt)
KERNEL[281.598362] add /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0/net/thunderbolt0 (net)
KERNEL[281.598366] add /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0/net/thunderbolt0/queues/rx-0 (queues)
KERNEL[281.598370] add /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0/net/thunderbolt0/queues/tx-0 (queues)
KERNEL[281.598430] bind /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0 (thunderbolt)
UDEV [281.600470] change /0-1 (thunderbolt)
UDEV [281.607491] add /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1 (thunderbolt)
UDEV [281.607910] add /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0 (thunderbolt)
KERNEL[281.609142] move /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0/net/en05 (net)
UDEV [281.623602] add /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0/net/en05 (net)
UDEV [281.623864] add /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0/net/thunderbolt0/queues/rx-0 (queues)
UDEV [281.624246] add /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0/net/thunderbolt0/queues/tx-0 (queues)
UDEV [281.624616] bind /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0 (thunderbolt)
UDEV [281.860964] move /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0/net/en05 (net)
KERNEL[320.627128] remove /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_non_active0 (nvmem)
KERNEL[320.627139] remove /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_active0 (nvmem)
KERNEL[320.627146] remove /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1 (thunderbolt)
KERNEL[320.628295] remove /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0/net/en05/queues/rx-0 (queues)
KERNEL[320.628303] remove /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0/net/en05/queues/tx-0 (queues)
KERNEL[320.628404] remove /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0/net/en05 (net)
UDEV [320.630211] remove /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_non_active0 (nvmem)
UDEV [320.630367] remove /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_active0 (nvmem)
UDEV [320.630502] remove /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0/net/en05/queues/rx-0 (queues)
UDEV [320.630527] remove /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1 (thunderbolt)
UDEV [320.630692] remove /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0/net/en05/queues/tx-0 (queues)
UDEV [320.631697] remove /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0/net/en05 (net)
KERNEL[320.724332] unbind /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0 (thunderbolt)
KERNEL[320.724343] remove /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0 (thunderbolt)
KERNEL[320.724347] remove /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1 (thunderbolt)
UDEV [320.724688] unbind /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0 (thunderbolt)
UDEV [320.724931] remove /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0 (thunderbolt)
UDEV [320.725119] remove /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1 (thunderbolt)

Where do i see what module is loaded?

@scyto
Copy link
Author

scyto commented Dec 23, 2023

Where do i see what module is loaded?

Its on every line -at the end - so you can see when thunderbolt module is used, queues module, nvmem, etc

are you sure you edited /etc/modules correctly? cxan you paste the output of here EXACTLY as it appears (use ` mark either side to format it, do this when ever your pate output also learn to use the double tick each side of output to format as block, it will make reading your comments way easier)

@zombiehoffa
Copy link

Is there a way to simulate an unplug/replug of the cable? i've got my ser7 cluster mostly working now, except the usb4/thunderbolt ports don't show new host and create the thunderbolt0 devices which are then renamed to en05 and en06 unless/until I physically unplug and replug the cables.

I found some place recommending the following:
echo "1" | tee /sys/bus/pci/devices/0000:c8:00.5/reset
echo "1" | tee /sys/bus/pci/rescan

but that does not work (although the first command definitely causes a lot of logs).

@scyto
Copy link
Author

scyto commented Jan 6, 2024

@zombiehoffa no idea i just unplugged them and plugged them back in during testing - note on some hardware which cable is thunderbol0 and which cable is thinderbolt1 will change, it is not predictable just FYI

The symptoms you describe are 100% the same as each node not having the modules file correct OR having issues with kernel driver OR having dodgy cables.

@AdriSchmi
Copy link

AdriSchmi commented Jan 8, 2024

@scyto Hi sorry for the late answer. Today is my fist day of work in 2024.

here ist the output from nano /etc/modules

GNU nano 7.2
/etc/modules
# /etc/modules: kernel modules to load at boot time.
#
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with "#" are ignored.
# Parameters can be specified after the module name.
thunderbolt
thunderbolt-net

@scyto
Copy link
Author

scyto commented Jan 17, 2024

looks good

can you make sure you have this line at the end of the interfaces file, this was missing from my instructions (but is in my live system)

# This must be the last line in the file
post-up /usr/bin/systemctl restart frr.service

@jacoburgin
Copy link

What is the point of having IPV4 and IPV6 setup for the TB 10. network when if one fails it doesn't failover to the other? I'm still struggling to get IPV4 to come up on boot but should it not failover to IPV6 ?

@travisw3
Copy link

travisw3 commented Feb 5, 2024

Hello @scyto

Thanks for putting all this together, I have been setting up a cluster using 3 NUC 13 Pros and have made it to the end of this section (all TB4 connections made) however after rebooting the units I found that one of my connections is causing a hang during reboots and it was putting of quite a bit of heat looking at the udevadm monitor I get the following output

KERNEL[676.845850] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1 (thunderbolt)
UDEV  [676.846221] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1 (thunderbolt)
KERNEL[676.858735] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_active0 (nvmem)
KERNEL[676.858743] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_non_active0 (nvmem)
UDEV  [676.859085] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_active0 (nvmem)
UDEV  [676.859108] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_non_active0 (nvmem)
KERNEL[676.860384] remove   /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_non_active0 (nvmem)
KERNEL[676.860395] remove   /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_active0 (nvmem)
KERNEL[676.860404] remove   /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1 (thunderbolt)
UDEV  [676.860709] remove   /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_non_active0 (nvmem)
UDEV  [676.860721] remove   /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_active0 (nvmem)
UDEV  [676.860868] remove   /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1 (thunderbolt)
KERNEL[678.051243] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1 (thunderbolt)
UDEV  [678.054194] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1 (thunderbolt)
KERNEL[678.064379] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_active0 (nvmem)
KERNEL[678.064390] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_non_active0 (nvmem)
UDEV  [678.064583] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_active0 (nvmem)
UDEV  [678.064932] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_non_active0 (nvmem)
KERNEL[678.066627] remove   /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_non_active0 (nvmem)
KERNEL[678.066635] remove   /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_active0 (nvmem)
KERNEL[678.066644] remove   /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1 (thunderbolt)
UDEV  [678.066716] remove   /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_non_active0 (nvmem)
UDEV  [678.066725] remove   /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_active0 (nvmem)
UDEV  [678.066747] remove   /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1 (thunderbolt)
KERNEL[750.416859] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port1/1-0:1.1 (thunderbolt)
UDEV  [750.419547] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port1/1-0:1.1 (thunderbolt)
KERNEL[750.427347] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port1/1-0:1.1/nvm_active0 (nvmem)
KERNEL[750.427361] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port1/1-0:1.1/nvm_non_active0 (nvmem)
UDEV  [750.427671] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port1/1-0:1.1/nvm_active0 (nvmem)
UDEV  [750.427868] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port1/1-0:1.1/nvm_non_active0 (nvmem)
KERNEL[754.725589] change   /1-1 (thunderbolt)
UDEV  [754.728059] change   /1-1 (thunderbolt)
KERNEL[755.751440] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1 (thunderbolt)
KERNEL[755.751464] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0 (thunderbolt)
KERNEL[755.751473] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt0 (net)
KERNEL[755.751479] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt0/queues/rx-0 (queues)
KERNEL[755.751485] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt0/queues/tx-0 (queues)
KERNEL[755.751492] bind     /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0 (thunderbolt)
UDEV  [755.751957] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1 (thunderbolt)
UDEV  [755.752120] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0 (thunderbolt)
KERNEL[755.752988] move     /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/en06 (net)
UDEV  [755.779674] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/en06 (net)
UDEV  [755.779884] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt0/queues/rx-0 (queues)
UDEV  [755.780230] add      /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt0/queues/tx-0 (queues)
UDEV  [755.780506] bind     /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0 (thunderbolt)
UDEV  [756.014526] move     /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/en06 (net)
KERNEL[853.208132] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1 (thunderbolt)
UDEV  [853.210861] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1 (thunderbolt)
KERNEL[853.219740] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_active1 (nvmem)
KERNEL[853.219748] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_non_active1 (nvmem)
UDEV  [853.220110] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_active1 (nvmem)
UDEV  [853.220489] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_non_active1 (nvmem)
KERNEL[853.221462] remove   /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_non_active1 (nvmem)
KERNEL[853.221469] remove   /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_active1 (nvmem)
KERNEL[853.221479] remove   /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1 (thunderbolt)
UDEV  [853.221698] remove   /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_non_active1 (nvmem)
UDEV  [853.221707] remove   /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_active1 (nvmem)
UDEV  [853.221827] remove   /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1 (thunderbolt)
KERNEL[855.475088] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1 (thunderbolt)
UDEV  [855.475427] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1 (thunderbolt)
KERNEL[855.485666] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_active1 (nvmem)
KERNEL[855.485676] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_non_active1 (nvmem)
UDEV  [855.485944] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_active1 (nvmem)
UDEV  [855.486058] add      /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_non_active1 (nvmem)
KERNEL[855.487442] remove   /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_non_active1 (nvmem)
KERNEL[855.487450] remove   /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_active1 (nvmem)
UDEV  [855.487456] remove   /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_non_active1 (nvmem)
KERNEL[855.487460] remove   /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1 (thunderbolt)
UDEV  [855.487579] remove   /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1/nvm_active1 (nvmem)
UDEV  [855.487608] remove   /devices/pci0000:00/0000:00:0d.2/domain0/0-0/usb4_port1/0-0:1.1 (thunderbolt)

In this section I took the cable out of one port and plugged it into the other.
I have tested this with multiple different cables and it seems as if I might have a bad port on this unit. I was just wondering if there is anything else I should be looking at before attempting to return this unit?

@DarkPhyber-hg
Copy link

Can anyone else confirm as a sanity check that the thunderbolt connections are half duplex? in other words 25gbps is only in one direction, but if you do a bidirection iperf3 test, each direction gets cut in half?

@debanyw
Copy link

debanyw commented Mar 8, 2024

I seem to have a similar problem to @markverg from a few months ago, where the short PCI IDs of the two USB4 ports are identical but the full paths are unique. The PCs I'm using are GEEKOM Mini IT13 units with the 13500H, so admittedly they're not official NUCs with Thunderbolt but I'm hoping they're close enough. I'm also using the latest Proxmox VE 8.1, with kernel 6.5.13-1-pve.

The interesting part is I created the systemd service files with the full path specified on all 3 nodes, but when I connect two nodes (with a USB-IF certified Thunderbolt 4 cable) I get no output from dmesg, udevadm, nothing. However, if I connect a node to my M3 macbook udevadm monitor shows the following events and it automatically creates a thunderbolt0 interface:

KERNEL[131.234763] add /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port1/1-0:1.1 (thunderbolt)
UDEV [131.237647] add /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port1/1-0:1.1 (thunderbolt)
KERNEL[131.246678] add /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port1/1-0:1.1/nvm_active0 (nvmem)
KERNEL[131.246686] add /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port1/1-0:1.1/nvm_non_active0 (nvmem)
UDEV [131.247007] add /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port1/1-0:1.1/nvm_active0 (nvmem)
UDEV [131.247322] add /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port1/1-0:1.1/nvm_non_active0 (nvmem)
KERNEL[133.843309] change /1-1 (thunderbolt)
UDEV [133.843605] change /1-1 (thunderbolt)
KERNEL[134.860067] add /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1 (thunderbolt)
KERNEL[134.860086] add /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0 (thunderbolt)
KERNEL[134.860091] add /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt0 (net)
KERNEL[134.860095] add /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt0/queues/rx-0 (queues)
KERNEL[134.860098] add /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt0/queues/tx-0 (queues)
KERNEL[134.860223] bind /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0 (thunderbolt)
UDEV [134.863026] add /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1 (thunderbolt)
UDEV [134.863319] add /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0 (thunderbolt)
UDEV [134.869495] add /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt0 (net)
UDEV [134.869801] add /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt0/queues/rx-0 (queues)
UDEV [134.870098] add /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt0/queues/tx-0 (queues)
UDEV [134.870399] bind /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0 (thunderbolt)
KERNEL[180.904358] remove /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port1/1-0:1.1/nvm_non_active0 (nvmem)
KERNEL[180.904375] remove /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port1/1-0:1.1/nvm_active0 (nvmem)
KERNEL[180.904382] remove /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port1/1-0:1.1 (thunderbolt)
KERNEL[180.904519] remove /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt0/queues/rx-0 (queues)
KERNEL[180.904533] remove /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt0/queues/tx-0 (queues)
KERNEL[180.904541] remove /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt0 (net)
UDEV [180.907119] remove /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port1/1-0:1.1/nvm_non_active0 (nvmem)
UDEV [180.907213] remove /devices/pci0000:00/0000:00:0d.3/domain1/1-0/usb4_port1/1-0:1.1/nvm_active0 (nvmem)
UDEV [133.843605] change /1-1 (thunderbolt)
KERNEL[134.860067] add /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1 (thunderbolt)
KERNEL[134.860086] add /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0 (thunderbolt)
KERNEL[134.860091] add /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt0 (net)
KERNEL[134.860095] add /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt0/queues/rx-0 (queues)
KERNEL[134.860098] add /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt0/queues/tx-0 (queues)
KERNEL[134.860223] bind /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0 (thunderbolt)
UDEV [134.863026] add /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1 (thunderbolt)
UDEV [134.863319] add /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0 (thunderbolt)
UDEV [134.869495] add /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt0 (net)
UDEV [134.869801] add /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt0/queues/rx-0 (queues)
UDEV [134.870098] add /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/thunderbolt0/queues/tx-0 (queues)
UDEV [134.870399] bind /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0 (thunderbolt)

I'm honestly stumped here, I have no idea why the nodes won't pick each other up but they will pick up some other device. It's baffling. If anyone has any ideas I'd greatly appreciate it.

@logiota
Copy link

logiota commented Apr 10, 2024

is it technically possible to use an intel i5 system as a thunderbolt switch with full lan access with proxmox?
cheaper than a 10gbe switch, you don't need a 10gbe card and you kind of get "PoE"

@scyto
Copy link
Author

scyto commented Apr 10, 2024

the TB mesh is effectively a router comprised of 3 nodes (not a switch) - you can absolutely access the mesh from the LAN so long as nodes on your LAN know how to reach the mesh next hop.

so for example on my router i have this, it lets any node on my LAN access any of the mesh interfaces
(i note my unifi router just added OSPF so one option would be to setup OSPF in FRR too so i don't need to maintain the static routes)

image

@scyto
Copy link
Author

scyto commented Apr 10, 2024

Can anyone else confirm as a sanity check that the thunderbolt connections are half duplex? in other words 25gbps is only in one direction, but if you do a bidirection iperf3 test, each direction gets cut in half?

correct half duplex is my understanding

also with one iperf if you are seeing more than ~27gbps i would be very surprised and would lilke to know your hardware!

@scyto
Copy link
Author

scyto commented Apr 10, 2024

changed auto to allow-hotplug
maybe needed on later kernels for reliability

@rlabusiness
Copy link

rlabusiness commented Apr 10, 2024

@scyto Is there a reason you don't have the following lines in the interfaces modifications above? You have one for en05, but not en06.

iface en06 inet6 manual
mtu 65520

@scyto
Copy link
Author

scyto commented Apr 11, 2024

@scyto Is there a reason you don't have the following lines in the interfaces modifications above? You have one for en05, but not en06.

iface en06 inet6 manual mtu 65520

fixed, i need to be more careful cutting and pasting ;-)

@jackzjh001
Copy link

@scyto Are you getting normal speed of thunderbolt network connection? I'm using the latest PVE8.1 with linux kernel 6.5, but the iperf3 shows the speed between it and a windows pc at only 40-60Mbps.

@scyto
Copy link
Author

scyto commented Apr 14, 2024

Are you getting normal speed of thunderbolt network connection?

on 6.5.13-3-pve yes i am seeing 26.7Gbps

Look at the spoiler section here https://forum.proxmox.com/threads/intel-nuc-13-pro-thunderbolt-ring-network-ceph-cluster.131107/post-652551, oddly i did see a drop to 18gbps in one direction and someone one saw a drop to 21gbps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment