Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Restore Rancher 2 cluster/node agents on clusters

Restore Rancher 2 cluster/node agents on clusters

This is an unsupported scenario, see https://github.com/rancher/rancher/issues/14731 when there is an official solution.

When cattle-cluster-agent and/or cattle-node-agent are accidentally deleted, or when server-url/cacerts are changed.

Generate definitions

  • Generate API token in the UI (user -> API & Keys) and save the Bearer token
  • Find the clusterid in the Rancher UI (format is c-xxxxx), its in the address bar when the cluster is selected
  • Generate agent definitions (needs curl, jq)
# Rancher URL
RANCHERURL="https://rancher.mydomain.com"
# Cluster ID
CLUSTERID="c-xxxxx"
# Token
TOKEN="token-xxxxx:xxxxx"
# Valid certificates
curl -s -H "Authorization: Bearer ${TOKEN}" "${RANCHERURL}/v3/clusterregistrationtokens?clusterId=${CLUSTERID}" | jq -r '.data[] | select(.name != "system") | .command'
# Self signed certificates
curl -s -k -H "Authorization: Bearer ${TOKEN}" "${RANCHERURL}/v3/clusterregistrationtokens?clusterId=${CLUSTERID}" | jq -r '.data[] | select(.name != "system") | .insecureCommand'

Apply definitions

The generated command needs to be executed using kubectl configured with a kubeconfig to talk to the cluster. See the gists below to retrieve the kubeconfig:

  1. Generate kubeconfig on node with controlplane role
docker run --rm --net=host -v $(docker inspect kubelet --format '{{ range .Mounts }}{{ if eq .Destination "/etc/kubernetes" }}{{ .Source }}{{ end }}{{ end }}')/ssl:/etc/kubernetes/ssl:ro --entrypoint bash $(docker inspect $(docker images -q --filter=label=io.cattle.agent=true) --format='{{index .RepoTags 0}}' | tail -1) -c 'kubectl --kubeconfig /etc/kubernetes/ssl/kubecfg-kube-node.yaml get configmap -n kube-system full-cluster-state -o json | jq -r .data.\"full-cluster-state\" | jq -r .currentState.certificatesBundle.\"kube-admin\".config | sed -e "/^[[:space:]]*server:/ s_:.*_: \"https://127.0.0.1:6443\"_"' > kubeconfig_admin.yaml
  1. Apply definitions (replace with the command returned from generating the definitions)
docker run --rm --net=host -v $PWD/kubeconfig_admin.yaml:/root/.kube/config --entrypoint bash $(docker inspect $(docker images -q --filter=label=io.cattle.agent=true) --format='{{index .RepoTags 0}}' | tail -1) -c 'curl --insecure -sfL https://xxx/v3/import/dl75kfmmbp9vj876cfsrlvsb9x9grqhqjd44zvnfd9qbh6r7ks97sr.yaml | kubectl apply -f -'
@meappy

This comment has been minimized.

Copy link

@meappy meappy commented Nov 21, 2019

Hi superseb

This guide helped to recreate cattle-cluster-agent and cattle-node-agent after I changed Rancher's server URL. However I had to modify some comamands to work for me. I run Rancher in a single standalone container with the rancher/rancher:latest image. I'm on macOS with jq installed from brew.

FWIW, I initially tried modifying the manifest directly, like so $ kubectl -n cattle-system edit deploy cattle-cluster-agent but that did not help my situation, that made the cattle-cluster-agent go into a CrashLoopBackOff state.

Here is what I did, hopefully this might help someone else. I am running an EKS cluster.

$ curl -s -H "Authorization: Bearer ${TOKEN}" "${RANCHERURL}/v3/clusterregistrationtokens?clusterId=${CLUSTERID}" | jq -r '.data[] | .command'
kubectl apply -f https://example.com/v3/import/wxw9jp98v9pcvplf6snhxgtwjfpxtn9vdft5c8kl5sfpbw2scrh8l8.yaml

$ kubectl apply -f https://example.com/v3/import/wxw9jp98v9pcvplf6snhxgtwjfpxtn9vdft5c8kl5sfpbw2scrh8l8.yaml --dry-run 
clusterrole.rbac.authorization.k8s.io/proxy-clusterrole-kubeapiserver configured (dry run)
clusterrolebinding.rbac.authorization.k8s.io/proxy-role-binding-kubernetes-master configured (dry run)
namespace/cattle-system configured (dry run)
serviceaccount/cattle configured (dry run)
clusterrolebinding.rbac.authorization.k8s.io/cattle-admin-binding configured (dry run)
secret/cattle-credentials-69ef1f7 configured (dry run)
clusterrole.rbac.authorization.k8s.io/cattle-admin configured (dry run)
deployment.apps/cattle-cluster-agent configured (dry run)
daemonset.apps/cattle-node-agent configured (dry run)

$ kubectl apply -f https://example.com/v3/import/wxw9jp98v9pcvplf6snhxgtwjfpxtn9vdft5c8kl5sfpbw2scrh8l8.yaml 
clusterrole.rbac.authorization.k8s.io/proxy-clusterrole-kubeapiserver unchanged
clusterrolebinding.rbac.authorization.k8s.io/proxy-role-binding-kubernetes-master unchanged
namespace/cattle-system unchanged
serviceaccount/cattle unchanged
clusterrolebinding.rbac.authorization.k8s.io/cattle-admin-binding unchanged
secret/cattle-credentials-69ef1f7 configured
clusterrole.rbac.authorization.k8s.io/cattle-admin unchanged
deployment.apps/cattle-cluster-agent configured
daemonset.apps/cattle-node-agent configured

Looking good so far

$ kubectl get pods -n cattle-system 
NAME                                    READY   STATUS    RESTARTS   AGE
cattle-cluster-agent-7c9b855dcd-9hv4h   1/1     Running   0          34m
cattle-node-agent-75q26                 1/1     Running   0          34m
cattle-node-agent-fkgm2                 1/1     Running   0          34m
cattle-node-agent-sgzw8                 1/1     Running   0          34m
@diego-lipinski-de-castro

This comment has been minimized.

Copy link

@diego-lipinski-de-castro diego-lipinski-de-castro commented Oct 1, 2020

what is "wxw9jp98v9pcvplf6snhxgtwjfpxtn9vdft5c8kl5sfpbw2scrh8l8"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment