Drallas/Build a Proxmox High Available cluster with Ceph (part 1).md

## Build a Proxmox High Available cluster with Ceph (part 1).md

      
    Raw
  

              Build a Proxmox High Available cluster with Ceph (part 1).md
            
          
    Build a Hyper-converged Proxmox HA Cluster with Ceph


Part of collection: Hyper-converged Homelab with Proxmox

This is part 1 focussing on the networking part of building a Proxmox High Available cluster with Ceph.
Part 2 focusses on building the Proxmox Cluster and setting up Ceph itself, and part 3 focussing on Managing and Troubleshooting Proxmox and Ceph.
Why

For some time, I was looking for options to build a Hyper-converged (HCI) Homelab, considering options like TrueNAS Scale, SUSE Harvester) among other option.
But then I discovered that I could build a Hyper-converged Infrastructure with Proxmox, the virtualization software that I was already using.

What

What this setup is creating, is a highly-available (resilient to physical failures, cable disconnects, etc.) and high speed Full Mesh IPv6 only communications channel between the Proxmox/Ceph and Nodes. This network is separated from the management network and only Ceph and internal cluster traffic flows over it.
Diagram Environment


Credit Diagram: John W Kerns (Packet Pushers)
For more background details on this setup see this excellent guide: Proxmox/Ceph – Full Mesh HCI Cluster w/ Dynamic Routing, on which I based my implementation.
Important Information

The official documentation is limited, but fortunately the previously mentioned guide (Proxmox/Ceph – Full Mesh HCI Cluster w/ Dynamic Routing) helped me to get this up and running with confidence that I could replicate the setup in case of a rebuild.
Assumptions

This guide starts with the following assumptions

3x servers for the HCI cluster
Fresh installation of Proxmox 8.0 or later
The “No-Subscription” or other update repository is set up on each server
Servers have been updated with the latest patches using apt or apt-get
Servers are connected to a management network, and you have access to the Proxmox GUI as well as root SSH access
Cluster links between nodes are connected in a ring topology as shown in the diagram
Proxmox cluster has not yet been configured
Ceph has not been installed or configured
Hosts have no VMs or there are no VMs currently running (you have the ability to perform reboots, etc.)

NOTE: All commands are being run as root!
Generic Values


Number of cluster links on each node: 2
Cluster Name: Homelab
Some of my interfaces have generic Ethernet device names eno1. But I also use 2.5 Gbit USB-C to Ethernet Dongles for the second mesh Ethernet link. That's why I labelled all the cables, to make sure they are plugged into the correct server in case I would take them (all) out)!

Node #1:

Name: pve01
FQDN: pve01.example.com
Proxmox Management IP address: 192.168.1.11
Cluster IPv6 address: fc00::1

Node #2:

Name: pve02
FQDN: pve02.example.com
Proxmox Management IP address: 192.168.1.12
Cluster IPv6 address: fc00::2

Node #3:

Name: pve03
FQDN: pve03.example.com
Proxmox Management IP address: 192.168.1.13
Cluster IPv6 address: fc00::3

Notes:

The fc00:: IPv6 addresses are Unique Local Addresses and don’t necessarily have to be replaced with something else unless you are using, or plan to use, those addresses elsewhere in your environment.
I actually have four nodes, but for simplicity I base this guide on 3 nodes.
This is also possible over Thunderbolt 3 / 4 Interfaces for speeds of 10 Gbit or even higher. Should have know this before, but unfortunately I did find this Gist from Scyto too late.

Setup

Let get to work and get this set-up!
Turn on Links

The first step is to figure out which are the 2.5 Gbit devices that are going to be used for the routing mesh and turn them on. For that, run lldpctl and look for the adapters (MAU oper type:), in my situation it is 2p5GigT but when using 1 or 10 Gbit it might display something like 1GigT or 10GigT.
Install LLDP


Logged into SSH as root, install the LLDP daemon with apt install lldpd -y
Once complete, run lldpctl to see your neighbour nodes over the cluster interfaces

This ensures your links are up and connected in the way you want
You should see something like the below


The lldpctl output looks like:
-------------------------------------------------------------------------------
LLDP neighbors:
-------------------------------------------------------------------------------
Interface:    eno1, via: LLDP, RID: 2, Time: 0 day, 00:00:05
  Chassis:
    ChassisID:    mac 54:b2:03:fd:44:e7
    SysName:      pve04.soholocal.nl
    SysDescr:     Debian GNU/Linux 12 (bookworm) Linux 6.2.16-10-pve #1 SMP PREEMPT_DYNAMIC PMX 6.2.16-10 (2023-08-18T11:42Z) x86_64
    MgmtIP:       192.168.1.14
    MgmtIface:    5
    MgmtIP:       fe80::a2ce:c8ff:fe9b:c0b9
    MgmtIface:    5
    Capability:   Bridge, on
    Capability:   Router, off
    Capability:   Wlan, off
    Capability:   Station, off
  Port:
    PortID:       mac 00:e0:4c:68:00:59
    PortDescr:    enx00e04c680059
    TTL:          120
    PMD autoneg:  supported: no, enabled: no
      MAU oper type: 2p5GigT - 2.5GBASE-T Four-pair twisted-pair balanced copper cabling PHY
Once you know which interfaces devices are part of the mesh:

On each node, log into the Proxmox GUI and navigate to System > Network
Edit the interfaces and make sure the “Autostart” checkbox is checked
Hit the Apply Configuration on the network page if you had to make any changes

My Network configuration looked like this:
pve01
Mesh Devices: eno1 and enx00e04c680048

pve02
Mesh Devices: eno1 and enx00e04c680001

pve03
Mesh Devices: eno1 and enx00e04c680101

If everything is setup correctly, each node should display it's 2 neighbour nodes via lldpctl.
Create Loopbacks


On each node, using the SSH terminal, edit the interfaces file: nano /etc/network/interfaces
Add the below interface definition

The ::1 number should represent your node number
Change to ::2 for node 2, ::3 for node 3, etc.
This will be the unique IP address of this node for Ceph and Proxmox cluster services


On each node, using the SSH terminal, edit the interfaces file nano /etc/network/interfaces.
Add the code below tho each node, only change the fc00::1/128 value on each node.
auto lo:0
iface lo:0 inet static
        address fc00::1/128
The section in /etc/network/interfaces should now look like below:


Save and close the file.
Restart network services to apply the changes: systemctl restart networking.service && systemctl status networking.service.

Enable IPv6 Forwarding

When the cluster ring is broken, some nodes will need to communicate with each other by routing through a neighbour node. To allow this to happen, we need to enable IPv6 forwarding in the Linux kernel.
On each node, edit the sysctl file: nano /etc/sysctl.conf
Uncomment the line:
#net.ipv6.conf.all.forwarding=1
To make it look like
net.ipv6.conf.all.forwarding=1

Save and close the file
Set the live IPv6 forwarding state with: sysctl net.ipv6.conf.all.forwarding=1
Check that the Linux kernel is set to forward IPv6 packets: sysctl net.ipv6.conf.all.forwarding
Output should be: net.ipv6.conf.all.forwarding = 1

Set Up Free Range Routing (FRR) OSPF


On each node, install FRR with: apt install frr
Edit the FRR config file: nano /etc/frr/daemons
Adjust ospf6d=no to ospf6d=yes and save the file
Restart FRR: systemctl restart frr.service

Configuration


On each node: enter the FRR shell: vtysh
Check the current config: show running-config
Enter config mode: configure
Apply the below configuration

The 0.0.0.1 number should represent your node number
Change to 0.0.0.2 for node 2, 0.0.0.3 for node 3, etc


router ospf6
 ospf6 router-id 0.0.0.1
 log-adjacency-changes
 exit
!
interface lo
 ipv6 ospf6 area 0
 exit
!
interface ens3f0
 ipv6 ospf6 area 0
 ipv6 ospf6 network point-to-point
 exit
!
interface ens3f1
 ipv6 ospf6 area 0
 ipv6 ospf6 network point-to-point
 exit
!

Exit the config mode: end

Hit enter and then type end one line below the exclamation mark!


Save the config: write memory
type exit to leave vtysh.

Verification

Once you did this on all Nodes, check for the OSPF6 neighbours:

vtysh -c 'show ipv6 ospf6 neighbor' from the Bash / ZSH terminal mode, or - show ipv6 ospf6 neighbor still in vtysh. Mode.

You should see two neighbours on each node:
pve01# show ipv6 ospf6 neighbor
Neighbor ID     Pri    DeadTime    State/IfState         Duration I/F[State]
0.0.0.4           1    00:00:33     Full/PointToPoint  1d15:14:19 eno1[PointToPoint]
0.0.0.2           1    00:00:32     Full/PointToPoint  1d15:14:39 enx00e04c680048[PointToPoint]

Show the IPv6 routing table in FRR: vtysh -c 'show ipv6 route'

You should see the IPv6 addresses of your neighbours in the table as OSPF routes


Show the IPv6 routing table in Linux: ip -6 route


Note: This example is my node pve01 and the direct Neighbors are pve02 fc00::2/128 and pve04 fc00::4/128 are shown; but because I have 4 node setup, the node pve03 fc00::3/128 is also seen.

Check that the Linux kernel is set to forward IPv6 packets: sysctl net.ipv6.conf.all.forwarding

Output should be: net.ipv6.conf.all.forwarding = 1


Test OSPF Routing & Redundancy

We should now have full reachability between our loopback interfaces. Let’s test it.

Ping every node from every node: ping fc00::1

Replace the IP to make sure you have reachability between all nodes


Check your neighbours are all up: vtysh -c 'show ipv6 ospf6 neighbor'
Pick one of your nodes and shut down one of your 10G links: ip link set eno1 down

Or you can pull out a cable if you prefer the real-world test
DO NOT do this on all nodes, just on one


Check that the link is down: ip link
Check your neighbours, should only have one on this node: vtysh -c 'show ipv6 ospf6 neighbor'
Ping every node from every node AGAIN: ping fc00::1

This should still work, you will route through one of your nodes to reach the detached one


Check your routing table: ip -6 route

You will see the links used reflect the routing path


Bring the downed link back up: ip link set eno1 up

Or plug the cable back in
The routing table should change back after approx 15 seconds


Ping every node from every node ONE LAST TIME: ping fc00::1

Make sure the system is working properly


Update the Hosts File


Edit the hosts file: nano /etc/hosts (or Use the Proxmox GUI)
Add the below lines to the file:

fc00::1 pve01.soholocal.nl pve01
fc00::2 pve02.soholocal.nl pve02
fc00::3 pve03.soholocal.nl pve03
fc00::4 pve04.soholocal.nl pve04

Ping each host by name and make sure the IPv6 address is used.
Reboot each server.
Once back online, ping each host by name again and make sure the IPv6 address is used.
Perform all the routing and redundancy tests above again to make sure a reboot does not break anything.


If everything checks out, your system and you are ready for Write-up: Build a Hyper-converged Proxmox Cluster with Ceph (part 2).

Issue

Network interface enx00e04c680059 Down

Only on my Intel NUC the enx00e04c680059 Network interface is chrasing once a day:
r8152 2-3:1.0 enx00e04c680059: Tx status -108
Not sure why yet, as a workaround I created a network-check.sh script that runs every 15 minutes and brings it up again if it's down.

Create a Crontab entry:
crontab -e and add the line below.

*/15 * * * * /root/scripts/network-check/network-check.sh > /dev/null 2>&1


## network-check.sh
#!/bin/bash

# Define the network interface name
interface="enx00e04c680059"

# Log directory
log_dir="/root/scripts/network-check/network_logs"

# Ensure the log directory exists
mkdir -p "$log_dir"

# Log file with today's date
LOG_FILE="$log_dir/network_check_$(date +"%Y-%m-%d").log"

# Function to perform log rotation
rotate_logs() {
  find "$log_dir" -name "network_check_*.log" -mtime +7 -exec rm {} \;
}


# Check if the network interface is up
if ip link show dev "$interface" | grep -q "UP,LOWER_UP"; then
  echo "Network interface $interface is already up."
  #echo "$(date) - Network interface $interface is already up." >> "$LOG_FILE"
else
  # Bring the network interface up
  ip link set dev "$interface" up
  if [ $? -eq 0 ]; then
    echo "Network interface $interface has been brought up successfully."
    echo "$(date) - Network interface $interface has been brought up successfully." >> "$LOG_FILE"
    sleep 15
    # Run vtysh command and log the output
    vtysh_output=$(vtysh -c 'show ipv6 ospf6 neighbor')
    echo "vtysh output:" >> "$LOG_FILE"
    echo "$vtysh_output" >> "$LOG_FILE"
  else
    echo "Failed to bring up network interface $interface."
    echo "$(date) - Failed to bring up network interface $interface." >> "$LOG_FILE"
  fi
fi


# Perform log rotation
rotate_logs
	#!/bin/bash

	# Define the network interface name
	interface="enx00e04c680059"

	# Log directory
	log_dir="/root/scripts/network-check/network_logs"

	# Ensure the log directory exists
	mkdir -p "$log_dir"

	# Log file with today's date
	LOG_FILE="$log_dir/network_check_$(date +"%Y-%m-%d").log"

	# Function to perform log rotation
	rotate_logs() {
	find "$log_dir" -name "network_check_*.log" -mtime +7 -exec rm {} \;
	}


	# Check if the network interface is up
	if ip link show dev "$interface" \| grep -q "UP,LOWER_UP"; then
	echo "Network interface $interface is already up."
	#echo "$(date) - Network interface $interface is already up." >> "$LOG_FILE"
	else
	# Bring the network interface up
	ip link set dev "$interface" up
	if [ $? -eq 0 ]; then
	echo "Network interface $interface has been brought up successfully."
	echo "$(date) - Network interface $interface has been brought up successfully." >> "$LOG_FILE"
	sleep 15
	# Run vtysh command and log the output
	vtysh_output=$(vtysh -c 'show ipv6 ospf6 neighbor')
	echo "vtysh output:" >> "$LOG_FILE"
	echo "$vtysh_output" >> "$LOG_FILE"
	else
	echo "Failed to bring up network interface $interface."
	echo "$(date) - Failed to bring up network interface $interface." >> "$LOG_FILE"
	fi
	fi


	# Perform log rotation
	rotate_logs