Skip to content

Instantly share code, notes, and snippets.

@scyto
Last active February 22, 2024 18:30
Show Gist options
  • Save scyto/629c61d36af07b5ee45adfb172e25384 to your computer and use it in GitHub Desktop.
Save scyto/629c61d36af07b5ee45adfb172e25384 to your computer and use it in GitHub Desktop.
IPv4 ospf mesh network for ceph

Enable OSPF Routing on Thunderbolt Mesh

This has been deprectaed

It is now superceded by Openfabric Routing see here

continue at your own peril, for reference only now.

Old Gist

This will result in an IPv4 routable mesh network that can survive any one node failure or any one cable failure. All the steps in this section must be performed on each node

Please note the main section of this gist describes IPv4 on mesh. Lower down you will find additonal files that cpatures:

  1. differences if you want dual stack IPv4 / IPv6 routing (this is now what i run since writing the original gist)
  2. opernfabric instead of OSPF (to do)

this gist is part of this series

Key Parameters

Key Information Used Note i used the 10.x IPv4 space as this is not used anywhere else on my network YMMV

lo = loopback en05/06 - these are the thunderbolt ports

Node l:

  • lo:0 = 10.0.0.81/32
  • en05 = 10.0.0.5/30
  • en06 = 10.0.0.9/30
  • ospf router-id = 0.0.0.1

Node 2:

  • lo:0 = 10.0.0.82/32
  • en05 = 10.0.0.10/30
  • en06 = 10.0.0.13/30
  • ospf router-id = 0.0.0.2

Node 3:

  • lo:0 = 10.0.0.83/32
  • en05 = 10.0.0.14/30
  • en06 = 10.0.0.6/30
  • ospf router-id = 0.0.0.3

Enable IPv4 forwarding

Using IPv4 to take advantage of not needing to use addresses - does make things simpler

  • uncomment #net.ipv4.ip_forward=1 using nano /etc/sysctl.conf (remove the # symbol and save the file)

Create Loopback interface

doing this means we don't have to give each thunderbolt a manual IPv6 addrees and that these addresses stay constant no matter what Add the following to each node using nano /etc/network/interfaces

This should go uder the auto lo section and for each node the X should be 1, 2 or depending on the node

auto lo:0
iface lo:0 inet static
        address 10.0.0.8X/32

so on the first node it would look comething like this:

...
auto lo
iface lo inet loopback
 
auto lo:0
iface lo:0 inet static
        address 10.0.0.81/32
...

Save file.

Assign IP address to en05 and en06 using the GUI

  1. use the table further up and assign addresses
  2. after appliying both addresss remeber to hit apply configuration button

Install OSPF (perform on all nodes)

  1. Install Free Range Routing (FRR) apt install frr
  2. Edit the FRR config file: nano /etc/frr/daemons
  3. Adjust ospfd=no to ospfd=yes
  4. save the file
  5. restart the service with systemctl restart frr

Configure OSPF (perforn on all nodes)

  1. enter the FRR shell with vtysh
  2. optionally show the current config with show running-config
  3. enter the configure mode with configure
  4. Apply the bellow configuration (it is possible to cut and paste this into the shell instead of typing it manually, you may need to press return to set the last !. Also check there were no errors in repsonse to the paste text.). Note: the X should be the number of the node you are working on, so for my stetup this would 0.0.0.1, 0.0.0.2 or 0.0.0.3.
ip forwarding
!
router ospf
 ospf router-id 0.0.0.X
 log-adjacency-changes
 exit
!
interface lo
 ip ospf area 0
 exit
!
interface en05
 ip ospf area 0
 ip ospf network broadcast
 exit
!
interface en06
 ip ospf area 0
 ip ospf network broadcast
 exit
!

  1. you may need to pres return after the last ! to get to a new line - if so do this
  2. exit the configure mode with the command end
  3. save the configu with write memory
  4. show the configure applied correctly with show running-config - note the order of the items will be different to how you entered them and thats ok. (If you made a mistake i found the easiest way was to edt /etc/frr/frr.conf - but be careful if you do that.)
  5. use the command exit to leave setup
  6. repeat steps 1 to 9 on the other 3 nodes
  7. once you have configured all 3 nodes issue the command vtysh -c "show ip ospf neighbor" you will see:
root@pve1:~# vtysh -c "show ip ospf neighbor"

Neighbor ID     Pri State           Up Time         Dead Time Address         Interface                        RXmtL RqstL DBsmL
0.0.0.2           1 Full/DROther    52m26s            33.951s 10.0.0.10       en06:10.0.0.9                        0     0     0
0.0.0.3           1 Full/DROther    51m56s            33.444s 10.0.0.6        en05:10.0.0.5                        0     0     0

  1. now issue the command vtysh -c "show ip route" and you will see:
root@pve1:~# vtysh -c "show ip route"
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

C>* 10.0.0.4/30 is directly connected, en05, 00:53:16
O>* 10.0.0.5/32 [110/0] is directly connected, en05, weight 1, 00:53:16
O   10.0.0.6/32 [110/10] via 10.0.0.6, en05 inactive, weight 1, 00:53:11
C>* 10.0.0.8/30 is directly connected, en06, 00:53:46
O>* 10.0.0.9/32 [110/0] is directly connected, en06, weight 1, 00:53:46
O   10.0.0.10/32 [110/10] via 10.0.0.10, en06 inactive, weight 1, 00:53:41
O>* 10.0.0.13/32 [110/10] via 10.0.0.10, en06, weight 1, 00:53:32
O>* 10.0.0.14/32 [110/10] via 10.0.0.6, en05, weight 1, 00:53:11
O   10.0.0.81/32 [110/0] is directly connected, lo, weight 1, 12:15:09
C>* 10.0.0.81/32 is directly connected, lo, 12:15:09
O>* 10.0.0.82/32 [110/10] via 10.0.0.10, en06, weight 1, 00:53:41
O>* 10.0.0.83/32 [110/10] via 10.0.0.6, en05, weight 1, 00:53:11
C>* 192.168.1.0/24 is directly connected, vmbr0, 12:15:06

and lastly ip route

root@pve1:~# ip route
default via 192.168.1.1 dev vmbr0 proto kernel onlink 
10.0.0.4/30 dev en05 proto kernel scope link src 10.0.0.5 
10.0.0.8/30 dev en06 proto kernel scope link src 10.0.0.9 
10.0.0.12/30 nhid 53 proto ospf metric 20 
        nexthop via 10.0.0.6 dev en05 weight 1 
        nexthop via 10.0.0.10 dev en06 weight 1 
10.0.0.82 nhid 54 via 10.0.0.10 dev en06 proto ospf metric 20 
10.0.0.83 nhid 33 via 10.0.0.6 dev en05 proto ospf metric 20 
192.168.1.0/24 dev vmbr0 proto kernel scope link src 192.168.1.81 

##Testing Example You can now test the network by pinging the IPv4 loopback addresses of the other nodes. For example ping (using my IPs defined earlier):

  • ping 10.0.0.81
  • ping 10.0.0.82
  • ping 10.0.0.83

Now pull one of the TB cables and repeat the test.

You should still be able to ping all nodes!!

This supplement is if you want dual stack IPv4 abd IPv6

Note id you are doing CEPH it should be either IPv4 or IPv6 for all the monitors, MDS and daemons - do not try and dual stack it, depite the docs implying it is ok my findsing on quincy are is it is funky.... so stick yp IPv4 or IPv6 - it is possible to switch ceph back and forth - but be very careful.... it will be scary (tl;dr pick one)

Create an IPv6 loopback

In /etc/network/intefaces you will want an IPv6 loopback for IPv6 seperate from IPv4. My best pactice is to use the same number in the last hextet as the last octet from IPv4 - makes things easy to remember)

so PVE1 would look like this, increment the last digit of the IP for each subsequent node.

...
auto lo:6
iface lo:6 inet static
        address fc00::81/128

...

Enable IPv4 and IPv6 forwarding

  1. use nano /etc/sysctl.conf to open the file
  2. uncomment #net.ipv6.conf.all.forwarding=1 (remove the # symbol)
  3. uncomment #net.ipv4.ip_forward=1 (remove the # symbol)
  4. save the file

FRR Setup

This is the content for FRR - rememvber to increment the router-ids on each node you use this on where you see X

edit the frr daemons file to change ospf6d=no to ospf6d=yes and ospfd=no to ospfd=yes

This is the config to issue in vtysh - note if you are moving from a pure IPv4 configu youy might want to stop the serice and delete the frr config file before doing this to reset the config.

ip forwarding
ipv6 forwarding
!
router ospf
 ospf router-id 0.0.0.X
 log-adjacency-changes
 exit
!
router ospf6
 ospf6 router-id 0.0.0.1
 log-adjacency-changes
 timers throttle spf 100 200 5000
 exit
!
interface lo
 ip ospf area 0
 ipv6 ospf6 area 0
 exit
!
interface en05
 ip ospf area 0
 ip ospf network broadcast
 ipv6 ospf6 area 0
 ipv6 ospf6 network broadcast
 exit
!
interface en06
 ip ospf area 0
 ip ospf network broadcast
 ipv6 ospf6 area 0
 ipv6 ospf6 network broadcast
 exit
!

to do: speed up faiover by plaing with deadtime, hello time etc.

@nihr43
Copy link

nihr43 commented Sep 24, 2023

Oh my bad, apparently frr does it a bit weird - you don't remove the address altogether, you duplicate the /32 lo address.

Cumulus Linux describes this using FRR.
..and frr says:

'''
When configuring a point-to-point network on an interface and the interface has a /32 address associated with then OSPF will treat the interface as being unnumbered. If you are doing this you must set the net.ipv4.conf..rp_filter value to 0.
'''

@scyto
Copy link
Author

scyto commented Sep 24, 2023

Oh my bad

NP. you got me all excited, because it was a royal PITA to get the thunderbolt interfaces and IPs to remain on the same physical interface, lol :-)

Do you think i would have better luck going numberless if I switch to FRR OpenFabric (fabricd)?

@scyto
Copy link
Author

scyto commented Sep 24, 2023

like would this work?

!
interface lo
 ip router openfabric 1
 ipv6 router openfabric 1
!
interface eth0
 ip router openfabric 1
 ipv6 router openfabric 1
!
interface eth1
 ip router openfabric 1
 ipv6 router openfabric 1
!
router openfabric 1
 net 49.0000.0000.0001.00

or does it need the IP addresses like show in the examples...

@scyto
Copy link
Author

scyto commented Sep 24, 2023

i will try it just for IPv4 and see....

@scyto
Copy link
Author

scyto commented Sep 24, 2023

yup that works, but the openfabric discovery time is soo slow

@scyto
Copy link
Author

scyto commented Sep 24, 2023

however switchover when pulling a link was really fast, when the link came back took maye 20 to 30 seconds to switch back

@scyto
Copy link
Author

scyto commented Sep 24, 2023

tl;dr

I now have IPv6 over OSFv3 and IPv4 over OpenFabric (fabricd)

tomorrow i will play with swithching over the IPv6 too....

pve3# show running-config 
Building configuration...

Current configuration:
!
frr version 8.5.2
frr defaults traditional
hostname pve3
service integrated-vtysh-config
!
interface en05
 ip router openfabric 1
 ipv6 ospf6 area 0
 ipv6 ospf6 network broadcast
exit
!
interface en06
 ip router openfabric 1
 ipv6 ospf6 area 0
 ipv6 ospf6 network broadcast
exit
!
interface lo
 ip router openfabric 1
 ipv6 ospf6 area 0
 openfabric passive
exit
!
router ospf6
 ospf6 router-id 0.0.0.3
 log-adjacency-changes
 timers throttle spf 100 200 5000
exit
!
router openfabric 1
 net 49.0000.0000.0003.00
exit
!

and

pve3# show openfabric topology 
Area 1:
IS-IS paths to level-2 routers that speak IP
Vertex               Type         Metric Next-Hop             Interface Parent
pve3                                                                  
10.0.0.83/32         IP internal  0                                     pve3(4)
pve2                 TE-IS        10     pve2                 en05      pve3(4)
pve1                 TE-IS        10     pve1                 en06      pve3(4)
10.0.0.82/32         IP TE        20     pve2                 en05      pve2(4)
10.0.0.81/32         IP TE        20     pve1                 en06      pve1(4)

and

root@pve3:~# cat /etc/network/interfaces
...
auto lo
iface lo inet loopback

auto lo:0
iface lo:0 inet static
        address 10.0.0.83/32

auto lo:6
iface lo:6 inet static
        address fc00::83/128

iface enp86s0 inet manual

auto en05
iface en05 inet manual
        mtu 65520

auto en06
iface en06 inet manual
        mtu 65520

auto vmbr0
iface vmbr0 inet static
        address 192.168.1.83/24
        gateway 192.168.1.1
        bridge-ports enp86s0
        bridge-stp off
        bridge-fd 0

once i have IPv6 working i will write a new gist to supersede this one as if this works well it is way simpler for folks....

@scyto
Copy link
Author

scyto commented Sep 24, 2023

bah couldn't leave it alone... this looks good (i still have OSPFv3 configured too)

pve3# show openfabric topology 
Area 1:
IS-IS paths to level-2 routers that speak IP
Vertex               Type         Metric Next-Hop             Interface Parent
pve3                                                                  
10.0.0.83/32         IP internal  0                                     pve3(4)
pve2                 TE-IS        10     pve2                 en05      pve3(4)
pve1                 TE-IS        10     pve1                 en06      pve3(4)
10.0.0.82/32         IP TE        20     pve2                 en05      pve2(4)
10.0.0.81/32         IP TE        20     pve1                 en06      pve1(4)

IS-IS paths to level-2 routers that speak IPv6
Vertex               Type         Metric Next-Hop             Interface Parent
pve3                                                                  
fc00::83/128         IP6 internal 0                                     pve3(4)
pve2                 TE-IS        10     pve2                 en05      pve3(4)
pve1                 TE-IS        10     pve1                 en06      pve3(4)
fc00::82/128         IP6 internal 20     pve2                 en05      pve2(4)
fc00::81/128         IP6 internal 20     pve1                 en06      pve1(4)

IS-IS paths to level-2 routers with hop-by-hop metric
Vertex               Type         Metric Next-Hop             Interface Parent

any reason NOT to use openfabric?

@scyto
Copy link
Author

scyto commented Sep 24, 2023

ok, does this seem a more elegant approach? numberless interfaces and simple frr config?

pve3# show running-config 
Building configuration...

Current configuration:
!
frr version 8.5.2
frr defaults traditional
hostname pve3
service integrated-vtysh-config
!
interface en05
 ip router openfabric 1
 ipv6 router openfabric 1
exit
!
interface en06
 ip router openfabric 1
 ipv6 router openfabric 1
exit
!
interface lo
 ip router openfabric 1
 ipv6 router openfabric 1
 openfabric passive
exit
!
router openfabric 1
 net 49.0000.0000.0003.00
exit
!
end

@scyto
Copy link
Author

scyto commented Sep 24, 2023

ok i have now deprecate this gist
it is now replaced with https://gist.github.com/scyto/4c664734535da122f4ab2951b22b2085

@nihr43
Copy link

nihr43 commented Sep 24, 2023

I have no familiarity with openfabric - but if its working go for it.
On slow convergence - frr has a 'datacenter' profile thats supposed to use more aggressive timers. You enable it with frr defaults datacenter at the top of frr.conf. I don't know what its affect will be with openfabric but for me with bgp convergence is pretty much instant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment