Skip to content

Instantly share code, notes, and snippets.

@scyto
Last active December 29, 2024 21:13
Show Gist options
  • Save scyto/4c664734535da122f4ab2951b22b2085 to your computer and use it in GitHub Desktop.
Save scyto/4c664734535da122f4ab2951b22b2085 to your computer and use it in GitHub Desktop.

Enable Dual Stack (IPv4 and IPv6) OpenFabric Routing

this gist is part of this series

This assumes you are running Proxmox 8.2 and that the line source /etc/network/interfaces.d/* is at the end of the interfaces file (this is automatically added to both new and upgraded installations of Proxmox 8.2).

This changes the previous file design thanks to @NRGNet for the suggestions to move thunderbolt settings to a file in /etc/network/interfaces.d it makes the system much more reliable in general, more maintainable esp for folks using IPv4 on the private cluster network (i still recommend the use of the IPv6 FC00 network you will see in these docs)

This will result in an IPv4 and IPv6 routable mesh network that can survive any one node failure or any one cable failure. Alls the steps in this section must be performed on each node

NOTES on Dual Stack

I have included this for completeness, i only run the FC00:: IPv6 network as ceph does not support dual stack, i strongly recommend you consider only using IPv6. For example for ceph do not dual stack - either use IPv4 or IPv6 addressees for all the monitors, MDS and daemons - despite the docs implying it is ok my findings on quincy are is it is funky....

Defining thunderbolt network

Create a new file using nano /etc/network/interfaces.d/thunderbolt and populate with the following Remember X should match you node number, so for example 1,2 or 3.

auto lo:0
iface lo:0 inet static
        address 10.0.0.8X/32
        
auto lo:6
iface lo:6 inet static
        address fc00::8X/128
        
allow-hotplug en05
iface en05 inet manual
        mtu 65520

allow-hotplug en06
iface en06 inet manual
        mtu 65520

Save file, repeat on each node.

Enable IPv4 and IPv6 forwarding

  1. use nano /etc/sysctl.conf to open the file
  2. uncomment #net.ipv6.conf.all.forwarding=1 (remove the # symbol)
  3. uncomment #net.ipv4.ip_forward=1 (remove the # symbol)
  4. save the file
  5. issue reboot now for a complete reboot

FRR Setup

Install FRR

Install Free Range Routing (FRR) apt install frr

Enable the fabricd daemon

  1. edit the frr daemons file (nano /etc/frr/daemons) to change fabricd=no to fabricd=yes
  2. save the file
  3. restart the service with systemctl restart frr

Mitigate FRR Timing Issues at Boot

Add post-up command to /etc/network/interfaces

sudo nano /etc/network/interfaces

Add post-up /usr/bin/systemctl restart frr.serviceas the last line in the file (this should go after the line that starts source)

NOTE for Minisforum MS-01 users

make the post-up line above read sleep 5 && post-up /usr/bin/systemctl restart frr.service instead this has been verified to be required due to timing issues see on those units, exact cause unknown, may be needed on other hardware too.

Configure OpenFabric (perforn on all nodes)

  1. enter the FRR shell with vtysh
  2. optionally show the current config with show running-config
  3. enter the configure mode with configure
  4. Apply the bellow configuration (it is possible to cut and paste this into the shell instead of typing it manually, you may need to press return to set the last !. Also check there were no errors in repsonse to the paste text.).

Note: the X should be the number of the node you are working on, as an example - 0.0.0.1, 0.0.0.2 or 0.0.0.3.

ip forwarding
ipv6 forwarding
!
interface en05
ip router openfabric 1
ipv6 router openfabric 1
exit
!
interface en06
ip router openfabric 1
ipv6 router openfabric 1
exit
!
interface lo
ip router openfabric 1
ipv6 router openfabric 1
openfabric passive
exit
!
router openfabric 1
net 49.0000.0000.000X.00
exit
!

  1. you may need to pres return after the last ! to get to a new line - if so do this

  2. exit the configure mode with the command end

  3. save the configu with write memory

  4. show the configure applied correctly with show running-config - note the order of the items will be different to how you entered them and thats ok. (If you made a mistake i found the easiest way was to edt /etc/frr/frr.conf - but be careful if you do that.)

  5. use the command exit to leave setup

  6. repeat steps 1 to 9 on the other 3 nodes

  7. once you have configured all 3 nodes issue the command vtysh -c "show openfabric topology" if you did everything right you will see:

Area 1:
IS-IS paths to level-2 routers that speak IP
Vertex               Type         Metric Next-Hop             Interface Parent
pve1                                                                  
10.0.0.81/32         IP internal  0                                     pve1(4)
pve2                 TE-IS        10     pve2                 en06      pve1(4)
pve3                 TE-IS        10     pve3                 en05      pve1(4)
10.0.0.82/32         IP TE        20     pve2                 en06      pve2(4)
10.0.0.83/32         IP TE        20     pve3                 en05      pve3(4)

IS-IS paths to level-2 routers that speak IPv6
Vertex               Type         Metric Next-Hop             Interface Parent
pve1                                                                  
fc00::81/128         IP6 internal 0                                     pve1(4)
pve2                 TE-IS        10     pve2                 en06      pve1(4)
pve3                 TE-IS        10     pve3                 en05      pve1(4)
fc00::82/128         IP6 internal 20     pve2                 en06      pve2(4)
fc00::83/128         IP6 internal 20     pve3                 en05      pve3(4)

IS-IS paths to level-2 routers with hop-by-hop metric
Vertex               Type         Metric Next-Hop             Interface Parent

Now you should be in a place to ping each node from evey node across the thunderbolt mesh using IPv4 or IPv6 as you see fit.

@e1ysion
Copy link

e1ysion commented Oct 11, 2024

Okay so the the cluster is working in general, but I have to restart the frr.service because it loads changes - before it would only show:

interface en05
ipv6 router openfabric 1
exit

After restart:

interface en05
ip router openfabric 1
ipv6 router openfabric 1
exit

I have the restart delay comand in the thunderbolt config.

I also unfortunately get this error when typing ifup en05:
warning: en05: up cmd '/etc/network/if-up.d/thunderbolt-affinity' failed ([Errno 13] Permission denied: '/etc/network/if-up.d/thunderbolt-affinity')

Any tips?

EDIT:
Managed to solve the permission issue with chmod 755 /etc/network/if-up.d/thunderbolt-affinity
But interface "issue" is still there. The cluster is working but ipv4 only shows after a restart, I don't know if this is an issue.

@Dasd1ngo
Copy link

Dasd1ngo commented Oct 20, 2024

Okay so the the cluster is working in general, but I have to restart the frr.service because it loads changes - before it would only show:

interface en05 ipv6 router openfabric 1 exit

After restart:

interface en05 ip router openfabric 1 ipv6 router openfabric 1 exit

I have the restart delay comand in the thunderbolt config.

I also unfortunately get this error when typing ifup en05: warning: en05: up cmd '/etc/network/if-up.d/thunderbolt-affinity' failed ([Errno 13] Permission denied: '/etc/network/if-up.d/thunderbolt-affinity')

Any tips?

EDIT: Managed to solve the permission issue with chmod 755 /etc/network/if-up.d/thunderbolt-affinity But interface "issue" is still there. The cluster is working but ipv4 only shows after a restart, I don't know if this is an issue.

I had a very similar issue, the config(ipv4 part) was not persisting in /etc/frr/frr.conf

Edit: manually updating the vi /etc/frr/frr.conf solves the problem temporarily

@nickglott
Copy link

nickglott commented Oct 20, 2024

I just realized I am on my other account, NRGnet is me as well just FYI if looking for previous comments

Mine has been rock solid for a while now, uhhh so I know my thunderbolt-affinity script was a bit different then what Allistash was using and it always worked for me so I left it, maybe try that? 0-7 are my P-cores change that for your p-cores if you want to give it a shot.

#!/bin/bash

# Check if the interface is either en05 or en06
if [ "$IFACE" = "en05" ] || [ "$IFACE" = "en06" ]; then
# Set Thunderbot affinity to Pcores
    grep thunderbolt /proc/interrupts | cut -d ":" -f1 | xargs -I {} sh -c 'echo 0-7 | tee "/proc/irq/{}/smp_affinity_list"'
fi

make sure you chmod +x /etc/network/if-up.d/thunderbolt-affinity

You postbin is expired so I can't see your setup, repost and I can give it a look. :D

Generally I noticed when it did not come up on boot and it means 1 of a few things, the interfaces are not allow-hotplug for en05 and en06, or ifup is not bringing the interface online (if you do ip a you should see both en05 and en06 state of UP. Or they are out of sync. Messing around with waits might help. Putting them in pve-en05.sh and pve-en06.sh did not work for me and prevented 1 interface from coming up. The only wait I am using is in /etc/network/interfaces post-up sleep 5 && /usr/bin/systemctl restart frr.service && sleep 20 That 2nd 20s sleep is special for me cause I am not using ceph but ZFS replacation and proxmox would try to migrate vms before FRR fully connected and they would fail so that prevents it in my case.

As for them being out of sync if the interface is not in a state of UP on ther other node, on a reboot the 1node might not come up. In my experiene when they are out of sync running ifreload -a on the oposite node would re-sync them but that is only if the interface's are UP on both sides. The quick test for that is power all 3 off and turn all 3 on at same time and see if its running like it should. When both ipv4 and ipv6 are set up I would see ipv4 fall out of sync and ipv6 be good.

This was a while back but discribes how they can be out of sync, I think I did end up finding another scinaro when they can loose snyc and posted in another comment but I couldnt find it really quickly.

and what symptoms do you see when IPv4 is an issue, i just found my cluster had got itself into a state where for the IPv4 a ping was being routed to the default gateway and the topology looked wrong on every node (only the nodes IP in the IPv4 topology on each node). This is only the second time i have ever seen this.
Rebooting the nodes as part of the move to interfaces.d/thunderbolt file fixed it
do you see EXACTLY the same symptoms in your 'IPv4 does not come up' scenario?

To clarify what I mean by manual intervention needed for ipv4 and why I think IPv6 should only be used. I assume this is a bug with frr.

I know these scenarios would be very rare and unlikely especially if you are running HA as you need to always keep 2 nodes up, but an extended power outage(ideally they would all power back up at the same time) but maybe different UPS's, one node gets delayed from a usb drive, a bad drive or pcie device and delays boot or on different power circuits and a breaker pops or dies, basically if there is a will there's a way and I don't like it.

The only good thing is 2 nodes will always sync so you shouldn't get any services not starting, HA issue, or data loss(with the exception of only 2 copies of data and 1 copy on the node that does not sync) but just a degraded setup.

If any node is on or started, and another node is started they will sync. If after that you turn on the last node it will only sync to the 1st started node. vtysh -c "show openfabric topology" will show it is connected through the 1st started node but you can not ping it. To fix run "ifreload -a" or "systemctl restart frr" on that 2nd started node (the one it can't ping). (Why it connects to the 1st node and not the 2nd? Your guess is is prob better than what I can think of cause I have nothing)

If 1 node is up and you start the 2nd and 3rd at the same time only 1 of those nodes connects to the 1st node, but they will connect to each other. The node that does not connect to the 1st node will show it is connected through the one that did but you can not ping it. To fix run "ifreload -a" or "systemctl restart frr" on 1st started or never off(the one it can't ping). (This is even weirder and happens to be the opposite node of the last scenario to run the command to fix and is just mind puzzling to me)

If you start 2 nodes at the same time they will sync, and then you start the last node it will not sync to any of the other 2 nodes. vtysh -c "show openfabric topology" will show it is connected but you can not ping it. To fix run "ifreload -a" or "systemctl restart frr" on both the first 2 started nodes, they can be ran at different times or with a delay between them. (If you only do it one the 3rd started node will show its connected "through" that node but you still can't ping it)

If you start all 3 nodes at the same time they all sync.

2 node configurations are also unaffected.

IPv6 works and syncs in any of these scenarios.

@e1ysion
Copy link

e1ysion commented Oct 20, 2024

I just realized I am on my other account, NRGnet is me as well just FYI if looking for previous comments

Mine has been rock solid for a while now, uhhh so I know my thunderbolt-affinity script was a bit different then what Allistash was using and it always worked for me so I left it, maybe try that? 0-7 are my P-cores change that for your p-cores if you want to give it a shot.

#!/bin/bash

# Check if the interface is either en05 or en06
if [ "$IFACE" = "en05" ] || [ "$IFACE" = "en06" ]; then
# Set Thunderbot affinity to Pcores
    grep thunderbolt /proc/interrupts | cut -d ":" -f1 | xargs -I {} sh -c 'echo 0-7 | tee "/proc/irq/{}/smp_affinity_list"'
fi

make sure you chmod +x /etc/network/if-up.d/thunderbolt-affinity

You postbin is expired so I can't see your setup, repost and I can give it a look. :D

Generally I noticed when it did not come up on boot and it means 1 of a few things, the interfaces are not allow-hotplug for en05 and en06, or ifup is not bringing the interface online (if you do ip a you should see both en05 and en06 state of UP. Or they are out of sync. Messing around with waits might help. Putting them in pve-en05.sh and pve-en06.sh did not work for me and prevented 1 interface from coming up. The only wait I am using is in /etc/network/interfaces post-up sleep 5 && /usr/bin/systemctl restart frr.service && sleep 20 That 2nd 20s sleep is special for me cause I am not using ceph but ZFS replacation and proxmox would try to migrate vms before FRR fully connected and they would fail so that prevents it in my case.

As for them being out of sync if the interface is not in a state of UP on ther other node, on a reboot the 1node might not come up. In my experiene when they are out of sync running ifreload -a on the oposite node would re-sync them but that is only if the interface's are UP on both sides. The quick test for that is power all 3 off and turn all 3 on at same time and see if its running like it should. When both ipv4 and ipv6 are set up I would see ipv4 fall out of sync and ipv6 be good.

This was a while back but discribes how they can be out of sync, I think I did end up finding another scinaro when they can loose snyc and posted in another comment but I couldnt find it really quickly.

and what symptoms do you see when IPv4 is an issue, i just found my cluster had got itself into a state where for the IPv4 a ping was being routed to the default gateway and the topology looked wrong on every node (only the nodes IP in the IPv4 topology on each node). This is only the second time i have ever seen this.
Rebooting the nodes as part of the move to interfaces.d/thunderbolt file fixed it
do you see EXACTLY the same symptoms in your 'IPv4 does not come up' scenario?

To clarify what I mean by manual intervention needed for ipv4 and why I think IPv6 should only be used. I assume this is a bug with frr.
I know these scenarios would be very rare and unlikely especially if you are running HA as you need to always keep 2 nodes up, but an extended power outage(ideally they would all power back up at the same time) but maybe different UPS's, one node gets delayed from a usb drive, a bad drive or pcie device and delays boot or on different power circuits and a breaker pops or dies, basically if there is a will there's a way and I don't like it.
The only good thing is 2 nodes will always sync so you shouldn't get any services not starting, HA issue, or data loss(with the exception of only 2 copies of data and 1 copy on the node that does not sync) but just a degraded setup.
If any node is on or started, and another node is started they will sync. If after that you turn on the last node it will only sync to the 1st started node. vtysh -c "show openfabric topology" will show it is connected through the 1st started node but you can not ping it. To fix run "ifreload -a" or "systemctl restart frr" on that 2nd started node (the one it can't ping). (Why it connects to the 1st node and not the 2nd? Your guess is is prob better than what I can think of cause I have nothing)
If 1 node is up and you start the 2nd and 3rd at the same time only 1 of those nodes connects to the 1st node, but they will connect to each other. The node that does not connect to the 1st node will show it is connected through the one that did but you can not ping it. To fix run "ifreload -a" or "systemctl restart frr" on 1st started or never off(the one it can't ping). (This is even weirder and happens to be the opposite node of the last scenario to run the command to fix and is just mind puzzling to me)
If you start 2 nodes at the same time they will sync, and then you start the last node it will not sync to any of the other 2 nodes. vtysh -c "show openfabric topology" will show it is connected but you can not ping it. To fix run "ifreload -a" or "systemctl restart frr" on both the first 2 started nodes, they can be ran at different times or with a delay between them. (If you only do it one the 3rd started node will show its connected "through" that node but you still can't ping it)
If you start all 3 nodes at the same time they all sync.
2 node configurations are also unaffected.
IPv6 works and syncs in any of these scenarios.

Hey nickglott, here my pastebin again, appreciate the help
https://privatebin.net/?b78b2da69af90ac7#5q4hXvEzqF8XPZnerzAk4QkJjXwQtwSUZQQRcZ7TcLN9

@nickglott
Copy link

@e1ysion Can you post your ip a for all nodes?

@nickglott
Copy link

@e1ysion I think I see it looks like FRR is not being restarted on boot, the post-up in /etc/network/interfaces.d/thunderbolt wont work, it only works in /etc/network/interfaces

Remove it from /etc/network/interfaces.d/thunderbolt and add it before source /etc/network/interfaces.d/* in /etc/network/interfaces

I don't think this will effect that but ffr can be a little funky some times but the above might fix whats below it if its not you might need to play with it.

in the running config for ffr ip router openfabric 1 is not listed for en05 but it is in the config /etc/frr/frr.conf you might need to re-do configure and write memory steps in the gist.

@e1ysion
Copy link

e1ysion commented Oct 20, 2024

@e1ysion I think I see it looks like FRR is not being restarted on boot, the post-up in /etc/network/interfaces.d/thunderbolt wont work, it only works in /etc/network/interfaces

Remove it from /etc/network/interfaces.d/thunderbolt and add it before source /etc/network/interfaces.d/* in /etc/network/interfaces

I don't think this will effect that but ffr can be a little funky some times but the above might fix whats below it if its not you might need to play with it.

in the running config for ffr ip router openfabric 1 is not listed for en05 but it is in the config /etc/frr/frr.conf you might need to re-do configure and write memory steps in the gist.

Thanks, will do

As per the chat here, it was written somewhere to put in ip router openfabric 0 and not 1, should I switch to 1?

@nickglott
Copy link

@e1ysion Can you post your ip a for all nodes?

Yes sir: PVE1 https://privatebin.net/?8739e5eaab48b40c#AB37gyGXSK2WLiQfKri9qXgPQeRi5GZhrWAeJgNBa4RN

PVE2 https://privatebin.net/?05ac2b2cb4413dbd#5Tvy8JBKmW1mf6TRTBuvLBBnrYUurZfnDGPPsJ842vs9

PVE3 https://privatebin.net/?f2ae90e2aa60b158#ESVo1a6NEKr8SpeFbhXYM5C23QyJEoFbBt21EQeEdpTZ

PVE4 https://privatebin.net/?c1f14e1adf091ea5#DZMHDgBb6nEHXKi55gfXAqSZYNfPP979PizunHPVmSUL

PVE5 https://privatebin.net/?9443d3bd9f926435#7pxsR6cEJXtfb8ZYeyUNFU9JHrGuYSNQqp4XSZEgSDoR

If i understand correctly(have not gone through all your previous posts) all your tunderbolt ports are en05/en06. PVE1 and 2 look good, PVE3 is only showing en06, PVE4 is not showing any, and PVE5 is only showing en05.

I think you need to figure out why that is for full mesh. On my devices having the sleep in pve-en05.sh and pve-en06.sh would make 1 interface not coming up why I moved my sleep to /etc/network/interfaces. I know you said you are using a few different devices and they tend to act differently. Scyto dosent need any sleeps for his to work so you kinda of just have to play around and see what works.

@e1ysion
Copy link

e1ysion commented Oct 20, 2024

@e1ysion Can you post your ip a for all nodes?

Yes sir: PVE1 https://privatebin.net/?8739e5eaab48b40c#AB37gyGXSK2WLiQfKri9qXgPQeRi5GZhrWAeJgNBa4RN
PVE2 https://privatebin.net/?05ac2b2cb4413dbd#5Tvy8JBKmW1mf6TRTBuvLBBnrYUurZfnDGPPsJ842vs9
PVE3 https://privatebin.net/?f2ae90e2aa60b158#ESVo1a6NEKr8SpeFbhXYM5C23QyJEoFbBt21EQeEdpTZ
PVE4 https://privatebin.net/?c1f14e1adf091ea5#DZMHDgBb6nEHXKi55gfXAqSZYNfPP979PizunHPVmSUL
PVE5 https://privatebin.net/?9443d3bd9f926435#7pxsR6cEJXtfb8ZYeyUNFU9JHrGuYSNQqp4XSZEgSDoR

If i understand correctly(have not gone through all your previous posts) all your tunderbolt ports are en05/en06. PVE1 and 2 look good, PVE3 is only showing en06, PVE4 is not showing any, and PVE5 is only showing en05.

I think you need to figure out why that is for full mesh. On my devices having the sleep in pve-en05.sh and pve-en06.sh would make 1 interface not coming up why I moved my sleep to /etc/network/interfaces. I know you said you are using a few different devices and they tend to act differently. Scyto dosent need any sleeps for his to work so you kinda of just have to play around and see what works.

I see. Do you recommend I try your frr config as well?

frr defaults traditional
hostname TheCore-0X
log syslog informational
ip forwarding
ipv6 forwarding
service integrated-vtysh-config
!
interface lo
ip address 10.0.10.X0/32
ip router openfabric 1
ipv6 address fc00::X0/128
ipv6 router openfabric 1
openfabric passive
!
interface en05
ip router openfabric 1
ipv6 router openfabric 1
openfabric csnp-interval 2
openfabric hello-interval 1
openfabric hello-multiplier 2
!
interface en06
ip router openfabric 1
ipv6 router openfabric 1
openfabric csnp-interval 2
openfabric hello-interval 1
openfabric hello-multiplier 2
!
line vty
!
router openfabric 1
net 49.0001.XXXX.XXXX.XXXX.00
fabric-tier 0
lsp-gen-interval 1
max-lsp-lifetime 600
lsp-refresh-interval 180

@nickglott
Copy link

Thanks, will do

As per the chat here, it was written somewhere to put in ip router openfabric 0 and not 1, should I switch to 1?

I don't understand FFR that much but I have allways used 1 for it so I dunno. I think eveyrone has had it set to 1.

@nickglott
Copy link

I see. Do you recommend I try your frr config as well?

I made that based off of this guide https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server You could try but you would need to tweak it a bit(hostname, net ip, ipv4 address, and ipv6 address) and also edit your
/etc/network/interfaces.d/thunderbolt to not have the lo interfaces, and remove ipforwarding fro /etc/sysctl.conf

In the end it does the same thing, I have not seen any difference with the extra lines added that the proxmox guide has that this gist doesnt tbh

@e1ysion
Copy link

e1ysion commented Oct 20, 2024

I see. Do you recommend I try your frr config as well?

I made that based off of this guide https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server You could try but you would need to tweak it a bit(hostname, net ip, ipv4 address, and ipv6 address) and also edit your /etc/network/interfaces.d/thunderbolt to not have the lo interfaces, and remove ipforwarding fro /etc/sysctl.conf

In the end it does the same thing, I have not seen any difference with the extra lines added that the proxmox guide has that this gist doesnt tbh

THanks man, appreciate your help. Will try around with the sleep a bit and leave my config be for now (apart from your recommendations)

@nickglott
Copy link

nickglott commented Oct 20, 2024

@e1ysion I think I see it looks like FRR is not being restarted on boot, the post-up in /etc/network/interfaces.d/thunderbolt wont work, it only works in /etc/network/interfaces

Remove it from /etc/network/interfaces.d/thunderbolt and add it before source /etc/network/interfaces.d/* in /etc/network/interfaces

I don't think this will effect that but ffr can be a little funky some times but the above might fix whats below it if its not you might need to play with it.

in the running config for ffr ip router openfabric 1 is not listed for en05 but it is in the config /etc/frr/frr.conf you might need to re-do configure and write memory steps in the gist.

Fixing this should solve your having to manualy restart frr on boot, but I think you need to mainly figure out how to get all nodes to have en05 and en06 "state UP" on all nodes when runnning ip a

With it set up correctly when running vtysh -c "show openfabric topology" you should see and en05 and en06 for each node its connected to like this

root@TheCore-01:/# vtysh -c "show openfabric topology"
Area 1:
IS-IS paths to level-2 routers that speak IP
Vertex               Type         Metric Next-Hop             Interface Parent
TheCore-01                                                            
10.0.10.10/32        IP internal  0                                     TheCore-01(4)
TheCore-02           TE-IS        4094   TheCore-02           en05      TheCore-01(4)
TheCore-03           TE-IS        4094   TheCore-03           en06      TheCore-01(4)
10.0.10.20/32        IP TE        4104   TheCore-02           en05      TheCore-02(4)
10.0.10.30/32        IP TE        4104   TheCore-03           en06      TheCore-03(4)

IS-IS paths to level-2 routers that speak IPv6
Vertex               Type         Metric Next-Hop             Interface Parent
TheCore-01                                                            
fc00::10/128         IP6 internal 0                                     TheCore-01(4)
TheCore-02           TE-IS        4094   TheCore-02           en05      TheCore-01(4)
TheCore-03           TE-IS        4094   TheCore-03           en06      TheCore-01(4)
fc00::20/128         IP6 internal 4104   TheCore-02           en05      TheCore-02(4)
fc00::30/128         IP6 internal 4104   TheCore-03           en06      TheCore-03(4)

@nickglott
Copy link

nickglott commented Oct 20, 2024

@e1ysion One more note, I could only do testing w/ 3 nodes since that is all I have so I have no clue about how them falling out of sync when certain noeds are rebooted, but I would think mainly the same principal applies to boot all nodes at same time, and only restart 1 at a time. I think the only way you will know if try try pinging each device and see what you can't ping, see what interface that is and try to adjust settings for that one. remember if its down on 1 node it prob is down on the other node as well so it might not be that specific nodes issue but the other one. Going from 3 to 5 really complicates things and why a 25g NIC and 25G switch would be better, although I know you cant just add them to mini pc's :D so good luck.

If you are still running into big issues you might wannt to drop to 3 nodes, get them running perfectly and add a node 1 at a time, that is how I would prob go about troubleshooting it.

@e1ysion
Copy link

e1ysion commented Oct 21, 2024

@e1ysion One more note, I could only do testing w/ 3 nodes since that is all I have so I have no clue about how them falling out of sync when certain noeds are rebooted, but I would think mainly the same principal applies to boot all nodes at same time, and only restart 1 at a time. I think the only way you will know if try try pinging each device and see what you can't ping, see what interface that is and try to adjust settings for that one. remember if its down on 1 node it prob is down on the other node as well so it might not be that specific nodes issue but the other one. Going from 3 to 5 really complicates things and why a 25g NIC and 25G switch would be better, although I know you cant just add them to mini pc's :D so good luck.

If you are still running into big issues you might wannt to drop to 3 nodes, get them running perfectly and add a node 1 at a time, that is how I would prob go about troubleshooting it.

THanks man :) one more question, do you know what the reason for the unrecognized TB ports could be? After a reboot sometimes a Plug/Unplug is not properly recognized and I have to replug until it eventually gets registered under devadm monitor.

@nickglott
Copy link

@e1ysion If there is no erorrs w/ the renaming/link files or w/ the pve-en05.sh and pve-en06.sh (doubble check they are exacutable also try no sleep in them or maybe longer/shorter sleeps) it could be the cables maybe they do tend to be a but finicky? Try moving it around to a different node or flip-floping it see if it follows the cable. Most of us I think are using the OWC cables that are quite short as we have had most sucsuess with them.

Only other thing I can think of would be maybe hardware/bios issue?

@contributorr
Copy link

@scyto It think it needs to be post-up sleep 5 && /usr/bin/systemctl restart frr.service the way you have it in the guide won't work as "post-up" is the command that triggers it. I also I think it needs to be /etc/network/interfaces not in the thunderbolt file as I don't think it is loading...

I just confirmed it.

@scyto Currently your gist is non-working because of this

@Allistah I can confirm that I do require to pin thunderbolt to Pcores to get full bandwidth and no to low retries. I tried to putting the script in /etc/init.d/thunderbolt-affinity and confirm it is not loading on boot. From what I have read really quickly is that it needs be a certain framework. https://manpages.debian.org/testing/sysvinit-utils/init-d-script.5.en.html.

I tried /etc/network/if-up.d/thunderbolt-affinity and can confrim it is working, however I don't think this is the proper way to do it as I don't think it needs to be ran every ifup. I don't think it would hurt anything...but I think that I am going to do a different way by adding it to /etc/rc.local so it runs 1 time on boot (init.d might be the proper way but I don't currently have the time to make that script).

A question I have is what code to use. I saw both in the proxmox fourm I was using the 1st one but required manually changing "0-7" for your Pcore's. Is the 2nd one reading what the Pcores are as they will be different for other cpus. I don't know what it is printing as I am not great with code haha.

#!/bin/bash

grep thunderbolt /proc/interrupts | cut -d ":" -f1 | xargs -I {} sh -c 'echo 0-7 | tee "/proc/irq/{}/smp_affinity_list"'

or

#!/bin/bash
for id in $(grep 'thunderbolt' /proc/interrupts | awk '{print $1}' | cut -d ':' -f1); do
    echo 0f > /proc/irq/$id/smp_affinity
done

Thanks a LOT!

Went from 400-600 retries in 1s to 0-20 retries and speed went from 16gbit to 26gbit.

@nimro27
Copy link

nimro27 commented Oct 27, 2024

I had some frr boot timing issues with my Minisforum MS-01 as described in the gist. However adding the following line to the /etc/network/interfaces was not working reliably for me.
sleep 5 && post-up /usr/bin/systemctl restart frr.service
Sometimes according to dmesg it took quite some time until the interfaces en05 and en06 where correctly set up. This for some reason lead to only ipv6 routing being initialized but not ipv4 (not sure why here).

So I added the following file /etc/systemd/system/frr.service.d/dependencies.conf with this content:

[Unit]
BindsTo=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device
After=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device

This ensures that frr only starts after en05 and en06 are fully initialized. This worked great for me and the post-up command is not needed anymore in the interfaces file. I thought I would share this if someone else had similar problems. I would also be happy to hear your thoughts on this solution.

@Allistah
Copy link

@scyto It think it needs to be post-up sleep 5 && /usr/bin/systemctl restart frr.service the way you have it in the guide won't work as "post-up" is the command that triggers it. I also I think it needs to be /etc/network/interfaces not in the thunderbolt file as I don't think it is loading...

I just confirmed it.

@scyto Currently your gist is non-working because of this
@Allistah I can confirm that I do require to pin thunderbolt to Pcores to get full bandwidth and no to low retries. I tried to putting the script in /etc/init.d/thunderbolt-affinity and confirm it is not loading on boot. From what I have read really quickly is that it needs be a certain framework. https://manpages.debian.org/testing/sysvinit-utils/init-d-script.5.en.html.
I tried /etc/network/if-up.d/thunderbolt-affinity and can confrim it is working, however I don't think this is the proper way to do it as I don't think it needs to be ran every ifup. I don't think it would hurt anything...but I think that I am going to do a different way by adding it to /etc/rc.local so it runs 1 time on boot (init.d might be the proper way but I don't currently have the time to make that script).
A question I have is what code to use. I saw both in the proxmox fourm I was using the 1st one but required manually changing "0-7" for your Pcore's. Is the 2nd one reading what the Pcores are as they will be different for other cpus. I don't know what it

#!/bin/bash

grep thunderbolt /proc/interrupts | cut -d ":" -f1 | xargs -I {} sh -c 'echo 0-7 | tee "/proc/irq/{}/smp_affinity_list"'

or

#!/bin/bash
for id in $(grep 'thunderbolt' /proc/interrupts | awk '{print $1}' | cut -d ':' -f1); do
    echo 0f > /proc/irq/$id

Thanks a LOT!

Went from 400-600 retries in 1s to 0-20 retries and speed went from 16gbit to 26gbit.

Awesome, glad that worked for ya! It’s a must for me to have this. There are others that don’t need it but we’re not 100% certain why.

@e1ysion
Copy link

e1ysion commented Nov 1, 2024

I had some frr boot timing issues with my Minisforum MS-01 as described in the gist. However adding the following line to the /etc/network/interfaces was not working reliably for me. sleep 5 && post-up /usr/bin/systemctl restart frr.service Sometimes according to dmesg it took quite some time until the interfaces en05 and en06 where correctly set up. This for some reason lead to only ipv6 routing being initialized but not ipv4 (not sure why here).

So I added the following file /etc/systemd/system/frr.service.d/dependencies.conf with this content:

[Unit]
BindsTo=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device
After=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device

This ensures that frr only starts after en05 and en06 are fully initialized. This worked great for me and the post-up command is not needed anymore in the interfaces file. I thought I would share this if someone else had similar problems. I would also be happy to hear your thoughts on this solution.

Hey, it says the subfolder frr.service.d does not exist. Should I just create it and put the .conf file there, as you described?

@Allistah
Copy link

Allistah commented Nov 1, 2024

So I added the following file /etc/systemd/system/frr.service.d/dependencies.conf with this content:

[Unit]
BindsTo=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device
After=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device

This is fantastic news! I'm going to put this in my notes and give it a try. Seems like a much better method than just delays which may or may not work reliably.

@nimro27 - Do we just create the directory called frr.service.d if it doesn't exist?

@nimro27
Copy link

nimro27 commented Nov 2, 2024

I had some frr boot timing issues with my Minisforum MS-01 as described in the gist. However adding the following line to the /etc/network/interfaces was not working reliably for me. sleep 5 && post-up /usr/bin/systemctl restart frr.service Sometimes according to dmesg it took quite some time until the interfaces en05 and en06 where correctly set up. This for some reason lead to only ipv6 routing being initialized but not ipv4 (not sure why here).
So I added the following file /etc/systemd/system/frr.service.d/dependencies.conf with this content:

[Unit]
BindsTo=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device
After=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device

This ensures that frr only starts after en05 and en06 are fully initialized. This worked great for me and the post-up command is not needed anymore in the interfaces file. I thought I would share this if someone else had similar problems. I would also be happy to hear your thoughts on this solution.

Hey, it says the subfolder frr.service.d does not exist. Should I just create it and put the .conf file there, as you described?

Yes just create it if it does not exist. That's what I did too.

@IndianaJoe1216
Copy link

I had some frr boot timing issues with my Minisforum MS-01 as described in the gist. However adding the following line to the /etc/network/interfaces was not working reliably for me. sleep 5 && post-up /usr/bin/systemctl restart frr.service Sometimes according to dmesg it took quite some time until the interfaces en05 and en06 where correctly set up. This for some reason lead to only ipv6 routing being initialized but not ipv4 (not sure why here).

So I added the following file /etc/systemd/system/frr.service.d/dependencies.conf with this content:

[Unit]
BindsTo=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device
After=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device

This ensures that frr only starts after en05 and en06 are fully initialized. This worked great for me and the post-up command is not needed anymore in the interfaces file. I thought I would share this if someone else had similar problems. I would also be happy to hear your thoughts on this solution.

@nimro27 Thank you so much! This solved it for me on my 3 node MS-01 cluster!

@FrankSchoene
Copy link

I had some frr boot timing issues with my Minisforum MS-01 as described in the gist. However adding the following line to the /etc/network/interfaces was not working reliably for me. sleep 5 && post-up /usr/bin/systemctl restart frr.service Sometimes according to dmesg it took quite some time until the interfaces en05 and en06 where correctly set up. This for some reason lead to only ipv6 routing being initialized but not ipv4 (not sure why here).
So I added the following file /etc/systemd/system/frr.service.d/dependencies.conf with this content:

[Unit]
BindsTo=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device
After=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device

This ensures that frr only starts after en05 and en06 are fully initialized. This worked great for me and the post-up command is not needed anymore in the interfaces file. I thought I would share this if someone else had similar problems. I would also be happy to hear your thoughts on this solution.

@nimro27 Thank you so much! This solved it for me on my 3 node MS-01 cluster!

When I add this file the frr.service does not even start anymore, not on boot and not manually. Any idea?

@KyGunsAndRadio
Copy link

I had some frr boot timing issues with my Minisforum MS-01 as described in the gist. However adding the following line to the /etc/network/interfaces was not working reliably for me. sleep 5 && post-up /usr/bin/systemctl restart frr.service Sometimes according to dmesg it took quite some time until the interfaces en05 and en06 where correctly set up. This for some reason lead to only ipv6 routing being initialized but not ipv4 (not sure why here).
So I added the following file /etc/systemd/system/frr.service.d/dependencies.conf with this content:

[Unit]
BindsTo=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device
After=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device

This ensures that frr only starts after en05 and en06 are fully initialized. This worked great for me and the post-up command is not needed anymore in the interfaces file. I thought I would share this if someone else had similar problems. I would also be happy to hear your thoughts on this solution.

@nimro27 Thank you so much! This solved it for me on my 3 node MS-01 cluster!

When I add this file the frr.service does not even start anymore, not on boot and not manually. Any idea?

@FrankSchoene @nimro27 Just a warning, when I attempted to use this approach (dependencies.conf) is caused a couple problems. It would occasionally prevent en05/06 from coming up during boot (logging dependency errors), but more importantly it would also cause the frr service on all other nodes to shutdown when the current node was rebooting. In my case, I had to remove the dependencies.conf and go back to using a post-up with sleep 10. One minor difference in my case, I used a script in /etc/network/if-up.d/ instead of explicitly adding post-up to the interfaces file. Everything seems to be working well now, and survives reboots of any node. I'm running 3 MS-01's.

@nimro27
Copy link

nimro27 commented Nov 23, 2024

I had some frr boot timing issues with my Minisforum MS-01 as described in the gist. However adding the following line to the /etc/network/interfaces was not working reliably for me. sleep 5 && post-up /usr/bin/systemctl restart frr.service Sometimes according to dmesg it took quite some time until the interfaces en05 and en06 where correctly set up. This for some reason lead to only ipv6 routing being initialized but not ipv4 (not sure why here).
So I added the following file /etc/systemd/system/frr.service.d/dependencies.conf with this content:

[Unit]
BindsTo=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device
After=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device

This ensures that frr only starts after en05 and en06 are fully initialized. This worked great for me and the post-up command is not needed anymore in the interfaces file. I thought I would share this if someone else had similar problems. I would also be happy to hear your thoughts on this solution.

@nimro27 Thank you so much! This solved it for me on my 3 node MS-01 cluster!

When I add this file the frr.service does not even start anymore, not on boot and not manually. Any idea?

@FrankSchoene @nimro27 Just a warning, when I attempted to use this approach (dependencies.conf) is caused a couple problems. It would occasionally prevent en05/06 from coming up during boot (logging dependency errors), but more importantly it would also cause the frr service on all other nodes to shutdown when the current node was rebooting. In my case, I had to remove the dependencies.conf and go back to using a post-up with sleep 10. One minor difference in my case, I used a script in /etc/network/if-up.d/ instead of explicitly adding post-up to the interfaces file. Everything seems to be working well now, and survives reboots of any node. I'm running 3 MS-01's.

Thanks for the warning, great catch! I did some more tests and changed the BindsTo to Wants in the dependencies.conf this solved the case that frr is shutdown on other nodes if one goes down.

[Unit]
Wants=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device
After=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device

However I could not replicate the case where en05/06 would not come up during boot. Could you explain the logging dependency error a bit more? Is it related to the shutdown issue with BindsTo?

@AdriSchmi
Copy link

My Routing look like this.
image
When i try to setup ceph i cannot add the monitors on ipv4 or ipv6

@Roomba5
Copy link

Roomba5 commented Dec 28, 2024

Just wondering if trying to add the thunderbolt networking to an existing cluster would cause issues? I have MS-01's with a spare NVME in each that I could use for CEPH, but I'm unsure if this approach would require me to start from scratch then restore some PBS backups?

Also is anyone using Kernel 6.8.12-4-pve?

@alexdelprete
Copy link

Thanks for the warning, great catch! I did some more tests and changed the BindsTo to Wants in the dependencies.conf this solved the case that frr is shutdown on other nodes if one goes down.

[Unit]
Wants=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device
After=sys-subsystem-net-devices-en05.device sys-subsystem-net-devices-en06.device

Thanks for this. It seems it solved some startup issues I was having on my MS-01 nodes. I'll have to do some more tests to be sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment