Created October 13, 2022 01:29
RKE2 Windows and Linux Troubleshooting and Debugging

RKE2 Windows Troubleshooting

High Level Notes

  • For any and all RKE2 Windows Clusters, v1.22.x or higher of RKE2 needs to be used. This is due to a Calico 3.19.x bug in v1.21.x of RKE2 that Tigera will not backport.
  • The minor version of Calico was changed midway through the RKE2 v1.22 lifecycle.
    • rke2 v1.22.3+rke2r1 through v1.22.6+rke2r1 have Calico 3.20.x (3.20.1 for v1.22.3+rke2r1 only and then 3.20.2 until v1.22.7+rke2r1)
    • rke2 v1.22.7+rke2r1 and up have Calico 3.21.4 (or higher)

Ensure that Docker is disabled before installing RKE2 Windows on custom clusters

stop-process dockerd
stop-service docker
set-service docker -startuptype disabled

RKE2 Specific Debugging



calicoctl for 3.20.x calico (rke2 v1.22.3 -> v1.22.6)

curl -o /usr/local/bin/calicoctl -O -L  "" 
chmod +x /usr/local/bin/calicoctl

calicoctl for 3.21.x calico (rke2 v1.22.7+)

curl -o /usr/local/bin/calicoctl -O -L  ""
chmod +x /usr/local/bin/calicoctl

calicoctl for 3.22.x (rke2 v1.23.x)

curl -o /usr/local/bin/calicoctl -O -L ""
chmod +x /usr/local/bin/calicoctl

Expected output

calicoctl ipam show --show-configuration
|      PROPERTY      | VALUE |
| StrictAffinity     | true  |
| AutoAllocateBlocks | true  |
| MaxBlocksPerHost   |     0 |

Note If StrictAffinity is set to false, it's possible you are using an outdated version of rke2 which had a bug in the implementation of Calico via rke2-charts. Fixed in v1.21.3-rc6+rke2r2

How to fix PATH issues


fix for current session

export PATH=$PATH:/var/lib/rancher/rke2/bin/
export KUBECONFIG="/etc/rancher/rke2/rke2.yaml"
export CRI_CONFIG_FILE=/var/lib/rancher/rke2/agent/etc/crictl.yaml
crictl config --set runtime-endpoint=unix:///run/k3s/containerd/containerd.sock

fix for future sessions

cat >> /etc/profile <<EOF
export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
export CRI_CONFIG_FILE=/var/lib/rancher/rke2/agent/etc/crictl.yaml
export PATH="$PATH:/var/lib/rancher/rke2/bin"
crictl config --set runtime-endpoint=unix:///run/k3s/containerd/containerd.sock
alias k=kubectl

Windows ref


    [Environment]::GetEnvironmentVariable("Path", [EnvironmentVariableTarget]::Machine) + ";c:\var\lib\rancher\rke2\bin;c:\usr\local\bin",

crictl runtime endpoint issues on Windows

The preferred method

$Env:CRI_CONFIG_FILE = "C:\var\lib\rancher\rke2\agent\etc\crictl.yaml"
crictl.exe ps -a

The backup method in the case of a misconfigured or missing crictl.yaml file

$Env:CONTAINER_RUNTIME_ENDPOINT = "npipe:////./pipe/containerd-containerd"
crictl.exe ps -a

Checking Windows RKE2 Agent logs

# check rke2 agent event logs
# get all logs
Get-EventLog -LogName Application -Source 'rke2' | select-object -Property ReplacementStrings,TimeWritten | Format-Table -Wrap -Autosize

# get last 50
Get-EventLog -LogName Application -Source 'rke2'  -Newest 50 | select-object -Property ReplacementStrings,TimeWritten | Format-Table -Wrap -Autosize

# extract the command-line args when an exe was run
# can swap in rke2.exe, kubelet.exe, kube-proxy.exe, etc
# any exe in the C:\var\lib\rancher\rke2\bin should work for this command
Get-WmiObject Win32_Process -Filter "name = 'containerd.exe'" | Select-Object CommandLine

Checking Windows rancher-wins Service logs

# Wins
Get-WmiObject win32_service | ?{$_.Name -like '*rancher-wins*'} | Select-Object -Property * | Format-Table -wrap -AutoSize

Get-WmiObject win32_service | ?{$_.Name -like '*rancher-wins*'} | select Name, DisplayName, PathName | Format-Table -wrap -AutoSize

Get-EventLog -LogName Application -Newest 20 -Source 'rancher-wins' 

vSphere Node Driver for RKE2 Windows Specific Debugging

# cloudbase
Get-EventLog -LogName 'Windows PowerShell' -Message *cloudbase* 
Get-EventLog -LogName System -Message *cloudbase* 
Get-EventLog -LogName Application -Message *cloudbase* 

Get-EventLog -LogName 'Windows PowerShell' -Message *cloudbase* | Select-Object -Property * | Format-Table -Wrap -Autosize

How to check named pipes

# get a list of all open named pipes

# another alternative
get-childitem \\.\pipe\

# this returns a list of objects
(get-childitem \\.\pipe\).FullName

General System information

Helpful Articles
Windows container requirements
What's new for Windows containers in Windows Server 2022

# get system info

# get current build version of windows

### How to Curl properly
# curl aliases to Invoke-WebRequest (iwr)
# long version: iwr -UseBasicParsing -Verbose -Uri
iwr -useb -v -uri

### How to use native curl.exe, which is a cross-compiled curl for windows

curl.exe -v

# get the system environment variables
# most notably in here are the RKE variables and any proxy settings

Get-ChildItem env:

# windows get build ID
# returns 1809, 1903, 1909, 2004, 20h2, 2009, 2022
(Get-ItemProperty -Path "HKLM:\SOFTWARE\Microsoft\Windows NT\CurrentVersion" -Name ReleaseId).ReleaseId

# get the cpu usage of all processes w/ descended sorting
Get-Counter '\Process(*)% Processor Time' | Select-Object -ExpandProperty countersamples| Select-Object -Property instancename, cookedvalue| ? {$_.instanceName -notmatch "^(idle|total|system)$"} | Sort-Object -Property cookedvalue -Descending| Select-Object -First 10| ft InstanceName,@{L='CPU';E={($.Cookedvalue/100/$env:NUMBER_OF_PROCESSORS).toString('P')}} -AutoSize

# get windows PATH with wrapped output
Get-ChildItem env:PATH | Format-Table -Wrap -Autosize

# start a new powershell admin session
powershell -Command "Start-Process PowerShell -Verb RunAs"
# OR
Start-Process PowerShell -Verb RunAs

RKE2 Specific Debugging Commands

# check rke2 agent event logs
Get-Eventlog -LogName Application -Source rke2 | Select-Object -Property Message | Format-List

# extract the command-line args when an exe was run
# can swap in rke2.exe, kubelet.exe, kube-proxy.exe, etc
# any exe in the C:\var\lib\rancher\rke2\bin should work for this command
Get-WmiObject Win32_Process -Filter "name = 'containerd.exe'" | Select-Object CommandLine

Windows Defender and Windows Firewall

# Optimal Microsoft Defender configuration for RKE2 Windows Agent
# This configuration is not *necessarily* production ready

Set-MpPreference -DisableRealtimeMonitoring $true -DisableScriptScanning $true -DisableArchiveScanning $true -AttackSurfaceReductionOnlyExclusions "c:\var\lib\rancher\rke2\bin,c:\usr\local\bin" -ScanAvgCPULoadFactor 10 -ExclusionPath "c:\usr\local\bin\rke2.exe, c:\var\lib\rancher\rke2\bin\calico-node.exe, c:\var\lib\rancher\rke2\bin\containerd.exe, c:\var\lib\rancher\rke2\bin\kubelet.exe, c:\var\lib\rancher\rke2\bin\kube-proxy.exe, c:\var\lib\rancher\rke2\bin\host-local.exe, c:\var\lib\rancher\rke2\bin\calico-ipam.exe, c:\var\lib\rancher\rke2\bin\containerd-shim-runhcs-v1.exe, c:\var\lib\rancher\rke2\bin\ctr.exe, C:\var\lib\rancher\rke2\bin\win-overlay.exe, C:\var\lib\rancher\rke2\bin\crictl.exe" -ControlledFolderAccessAllowedApplications "C:\usr\local\bin\rke2.exe" -ExclusionProcess "rke2, calico-node, containerd, kubelet, kube-proxy, host-local, calico-ipam, containerd-shim-runhcs-v1, ctr, win-overlay, crictl"

if ($env:CATTLE_SERVER) {
    Add-MpPreference -ExclusionIpAddress "$env:CATTLE_SERVER"

  Add-MpPreference -ExclusionPath "$env:CATTLE_AGENT_BIN_PREFIX\bin\rke2.exe, $env:CATTLE_AGENT_BIN_PREFIX\bin\calico-node.exe, $env:CATTLE_AGENT_BIN_PREFIX\bin\containerd.exe, $env:CATTLE_AGENT_BIN_PREFIX\bin\kubelet.exe, $env:CATTLE_AGENT_BIN_PREFIX\bin\kube-proxy.exe, $env:CATTLE_AGENT_BIN_PREFIX\bin\host-local.exe, $env:CATTLE_AGENT_BIN_PREFIX\bin\calico-ipam.exe, $env:CATTLE_AGENT_BIN_PREFIX\bin\containerd-shim-runhcs-v1.exe, $env:CATTLE_AGENT_BIN_PREFIX\bin\ctr.exe, $env:CATTLE_AGENT_BIN_PREFIX\bin\win-overlay.exe, $env:CATTLE_AGENT_BIN_PREFIX\bin\crictl.exe"
# Verify our defender preferences were set

# How to Disable Windows Firewall for all Firewall Profiles

Set-NetFirewallProfile -Profile Domain,Public,Private -Enabled False

Windows Networking

Helpful Articles
Introducing the Host Compute Service (HCS)
Windows container networking

# Display the common properties for the specified network adapter

Get-NetAdapter -Name "*"
Get-NetAdapter -Name "vEthernet (nat)"
Get-NetAdapter -Name "Ethernet 3" | Format-List -Property *

# Get the network routes for a given interface index (from the get-netadapter output)

Get-NetRoute -InterfaceIndex 10

# different methods of querying network devices/IPs/Interfaces


### General Network Troubleshooting

ipconfig /allcompartments /all        

# check MTU
netsh interface ipv4 show subinterface

# Routes
netstat -r
netsh interface ipv4 show route

# Statistics
netstat -es

# Active Connections
netstat -qb

Microsoft HNS (Host Network System)

#### You can query HNS resources using hnsdiag executable (Hyper-V Host Network Service Diagnostics Tool) or by using Powershell cmdlets.

#### I recommend using the Powershell cmdlets as they offer more functionality.


  hnsdiag <command> <object> [options ...]

   list <object>
     Lists the specified object(s).

   delete <object> <id>
     Delete the specified object.

     All  (only valid when used with list)

           Detailed option, when used with list, dumps the json of the object

### HNS Networks

# Get all HNS networks and details

# get the nat HNS network
Get-HnsNetwork | where {$ -eq "nat"}

# get the calico HNS network
Get-HnsNetwork | where {$ -eq "calico"}

# get the external HNS Network for Calico
Get-HnsNetwork | where {$ -eq "external"}

### HNS Endpoints


Get-HnsEndpoint | where {$_.IPAddress -eq  ""}

### HNS Policies


# check HNS policies against endpoints

$p = @(Get-HnsPolicyList | select {$_.References, Policies})
$eps = (get-hnsendpoint |  select {$_.ID})
$p1 = $p -Replace "/endpoints/"
$p = $p.Trim("/endpoints/","")
foreach ($)

# extract shared container ID and encapsulation overhead for all HNS endpoints
get-hnsendpoint | select-object -property encapoverhead, sharedcontainers

Checking Windows Server proxy settings

Get-ItemProperty -Path "Registry::HKCU\Software\Microsoft\Windows\CurrentVersion\Internet Settings"
Get-ChildItem env: | findstr PROXY
Get-ChildItem env: | findstr proxy
netsh winhttp show proxy

Adding a proxy to Windows Server

netsh winhttp set proxy <proxy>:<port>
set HTTP_PROXY=<proxy>:<port>
set HTTPS_PROXY=<proxy>:<port>
set NO_PROXY=localhost,,*
[Environment]::SetEnvironmentVariable("HTTP_PROXY", "<proxy>:<port>", [EnvironmentVariableTarget]::Machine)
[Environment]::SetEnvironmentVariable("HTTPS_PROXY", "<proxy>:<port>", [EnvironmentVariableTarget]::Machine)
[Environment]::SetEnvironmentVariable("NO_PROXY", "localhost,,*", [EnvironmentVariableTarget]::Machine)
netsh winhttp set proxy <proxy>:<port>
set HTTP_PROXY=<proxy>:<port>
set HTTPS_PROXY=<proxy>:<port>
set NO_PROXY=localhost,,,,cattle-system.svc
[Environment]::SetEnvironmentVariable("HTTP_PROXY", "<proxy>:<port>",
[Environment]::SetEnvironmentVariable("HTTPS_PROXY", "<proxy>:<port>", [EnvironmentVariableTarget]::Machine)
[Environment]::SetEnvironmentVariable("NO_PROXY", "localhost,,,,,cattle-system.svc", [EnvironmentVariableTarget]::Machine)
Set-ItemProperty -path "HKCU:\Software\Microsoft\Windows\CurrentVersion\Internet Settings" ProxyEnable -value 1
Set-ItemProperty -path "HKCU:\Software\Microsoft\Windows\CurrentVersion\Internet Settings" ProxyServer -value "https=$env:HTTPS_PROXY;http=$env:HTTP_PROXY"
Set-ItemProperty -path "HKCU:\Software\Microsoft\Windows\CurrentVersion\Internet Settings" ProxyOverride -value $env:NO_PROXY.Replace(',',';')

# If you are using an FTP proxy, uncomment and set the following in addition to what is above
#set FTP_PROXY=<proxy>:<port>
#[Environment]::SetEnvironmentVariable("FTP_PROXY", "<proxy>:<port>", [EnvironmentVariableTarget]::Machine)
#Set-ItemProperty -path "HKCU:\Software\Microsoft\Windows\CurrentVersion\Internet Settings" ProxyServer -value "https=$env:HTTPS_PROXY;http=$env:HTTP_PROXY;ftp=$env:FTP_PROXY"

Multi-platform Tests for Standalone RKE2 (useful for testing Calico Network Policies)

This will deploy a powershell core application on each Linux and Windows node. Once the workload is running pods will respond on port 3000/tcp.

kubectl apply -f

How to test on Linux RKE2 Nodes

# prep 
export PATH=$PATH:/var/lib/rancher/rke2/bin/
export KUBECONFIG="/etc/rancher/rke2/rke2.yaml"
export CRI_CONFIG_FILE=/var/lib/rancher/rke2/agent/etc/crictl.yaml
crictl config --set runtime-endpoint=unix:///run/k3s/containerd/containerd.sock

# exec into pstools
crictl exec -it <CONTAINER_ID> pwsh

# run inside the pod
Invoke-RestMethod <LINUX_OR_WINDOWS_POD_IP>:3000

# or use curl.exe
curl.exe -L <LINUX_OR_WINDOWS_POD_IP>:3000

How to test on Windows RKE2 Nodes

# prep
$Env:CRI_CONFIG_FILE = "C:\var\lib\rancher\rke2\agent\etc\crictl.yaml"
$Env:CONTAINER_RUNTIME_ENDPOINT = "npipe:////./pipe/containerd-containerd"

# exec into pstools
crictl exec -it <CONTAINER_ID> pwsh

# run inside the pod
Invoke-RestMethod <LINUX_OR_WINDOWS_POD_IP>:3000

# or use curl.exe
curl.exe -L <LINUX_OR_WINDOWS_POD_IP>:3000
