Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Restore Rancher 2 cluster/node agents on clusters

Restore Rancher 2 cluster/node agents on clusters

This is an unsupported scenario, see rancher/rancher#14731 when there is an official solution.

When cattle-cluster-agent and/or cattle-node-agent are accidentally deleted, or when server-url/cacerts are changed.

Generate definitions

  • Generate API token in the UI (user -> API & Keys) and save the Bearer token
  • Find the clusterid in the Rancher UI (format is c-xxxxx), its in the address bar when the cluster is selected
  • Generate agent definitions (needs curl, jq)
# Rancher URL
RANCHERURL="https://rancher.mydomain.com"
# Cluster ID
CLUSTERID="c-xxxxx"
# Token
TOKEN="token-xxxxx:xxxxx"
# Valid certificates
curl -s -H "Authorization: Bearer ${TOKEN}" "${RANCHERURL}/v3/clusterregistrationtokens?clusterId=${CLUSTERID}" | jq -r '.data[] | select(.name != "system") | .command'
# Self signed certificates
curl -s -k -H "Authorization: Bearer ${TOKEN}" "${RANCHERURL}/v3/clusterregistrationtokens?clusterId=${CLUSTERID}" | jq -r '.data[] | select(.name != "system") | .insecureCommand'

Apply definitions

The generated command needs to be executed using kubectl configured with a kubeconfig to talk to the cluster. See the gists below to retrieve the kubeconfig:

  1. Generate kubeconfig on node with controlplane role
docker run --rm --net=host -v $(docker inspect kubelet --format '{{ range .Mounts }}{{ if eq .Destination "/etc/kubernetes" }}{{ .Source }}{{ end }}{{ end }}')/ssl:/etc/kubernetes/ssl:ro --entrypoint bash $(docker inspect $(docker images -q --filter=label=io.cattle.agent=true) --format='{{index .RepoTags 0}}' | tail -1) -c 'kubectl --kubeconfig /etc/kubernetes/ssl/kubecfg-kube-node.yaml get configmap -n kube-system full-cluster-state -o json | jq -r .data.\"full-cluster-state\" | jq -r .currentState.certificatesBundle.\"kube-admin\".config | sed -e "/^[[:space:]]*server:/ s_:.*_: \"https://127.0.0.1:6443\"_"' > kubeconfig_admin.yaml
  1. Apply definitions (replace with the command returned from generating the definitions)
docker run --rm --net=host -v $PWD/kubeconfig_admin.yaml:/root/.kube/config --entrypoint bash $(docker inspect $(docker images -q --filter=label=io.cattle.agent=true) --format='{{index .RepoTags 0}}' | tail -1) -c 'curl --insecure -sfL https://xxx/v3/import/dl75kfmmbp9vj876cfsrlvsb9x9grqhqjd44zvnfd9qbh6r7ks97sr.yaml | kubectl apply -f -'
@meappy
Copy link

meappy commented Nov 21, 2019

Hi superseb

This guide helped to recreate cattle-cluster-agent and cattle-node-agent after I changed Rancher's server URL. However I had to modify some comamands to work for me. I run Rancher in a single standalone container with the rancher/rancher:latest image. I'm on macOS with jq installed from brew.

FWIW, I initially tried modifying the manifest directly, like so $ kubectl -n cattle-system edit deploy cattle-cluster-agent but that did not help my situation, that made the cattle-cluster-agent go into a CrashLoopBackOff state.

Here is what I did, hopefully this might help someone else. I am running an EKS cluster.

$ curl -s -H "Authorization: Bearer ${TOKEN}" "${RANCHERURL}/v3/clusterregistrationtokens?clusterId=${CLUSTERID}" | jq -r '.data[] | .command'
kubectl apply -f https://example.com/v3/import/wxw9jp98v9pcvplf6snhxgtwjfpxtn9vdft5c8kl5sfpbw2scrh8l8.yaml

$ kubectl apply -f https://example.com/v3/import/wxw9jp98v9pcvplf6snhxgtwjfpxtn9vdft5c8kl5sfpbw2scrh8l8.yaml --dry-run 
clusterrole.rbac.authorization.k8s.io/proxy-clusterrole-kubeapiserver configured (dry run)
clusterrolebinding.rbac.authorization.k8s.io/proxy-role-binding-kubernetes-master configured (dry run)
namespace/cattle-system configured (dry run)
serviceaccount/cattle configured (dry run)
clusterrolebinding.rbac.authorization.k8s.io/cattle-admin-binding configured (dry run)
secret/cattle-credentials-69ef1f7 configured (dry run)
clusterrole.rbac.authorization.k8s.io/cattle-admin configured (dry run)
deployment.apps/cattle-cluster-agent configured (dry run)
daemonset.apps/cattle-node-agent configured (dry run)

$ kubectl apply -f https://example.com/v3/import/wxw9jp98v9pcvplf6snhxgtwjfpxtn9vdft5c8kl5sfpbw2scrh8l8.yaml 
clusterrole.rbac.authorization.k8s.io/proxy-clusterrole-kubeapiserver unchanged
clusterrolebinding.rbac.authorization.k8s.io/proxy-role-binding-kubernetes-master unchanged
namespace/cattle-system unchanged
serviceaccount/cattle unchanged
clusterrolebinding.rbac.authorization.k8s.io/cattle-admin-binding unchanged
secret/cattle-credentials-69ef1f7 configured
clusterrole.rbac.authorization.k8s.io/cattle-admin unchanged
deployment.apps/cattle-cluster-agent configured
daemonset.apps/cattle-node-agent configured

Looking good so far

$ kubectl get pods -n cattle-system 
NAME                                    READY   STATUS    RESTARTS   AGE
cattle-cluster-agent-7c9b855dcd-9hv4h   1/1     Running   0          34m
cattle-node-agent-75q26                 1/1     Running   0          34m
cattle-node-agent-fkgm2                 1/1     Running   0          34m
cattle-node-agent-sgzw8                 1/1     Running   0          34m

@diego-lipinski-de-castro

what is "wxw9jp98v9pcvplf6snhxgtwjfpxtn9vdft5c8kl5sfpbw2scrh8l8"?

@MSandro
Copy link

MSandro commented Jul 5, 2021

Hi, thanks for the guide, but I get some forbidden errors:

[root@v10p01c ~]# docker run --rm --net=host -v $PWD/kubeconfig_admin.yaml:/root/.kube/config --entrypoint bash $(docker inspect $(docker images -q --filter=label=io.cattle.agent=true) --format='{{index .RepoTags 0}}' | tail -1) -c 'kubectl apply -f https://example.com/v3/import/jw9wp6bl4t4f6mjdjt5trmkxcn7g7vn5r6sslvg9zz9lkt785lfl27_c-8pvmt.yaml'
clusterrole.rbac.authorization.k8s.io/proxy-clusterrole-kubeapiserver unchanged
clusterrolebinding.rbac.authorization.k8s.io/proxy-role-binding-kubernetes-master unchanged
namespace/cattle-system unchanged
clusterrolebinding.rbac.authorization.k8s.io/cattle-admin-binding unchanged
clusterrole.rbac.authorization.k8s.io/cattle-admin unchanged
Error from server (Forbidden): error when creating "https://example.com/v3/import/jw9wp6bl4t4f6mjdjt5trmkxcn7g7vn5r6sslvg9zz9lkt785lfl27_c-8pvmt.yaml": serviceaccounts "cattle" is forbidden: unable to create new content in namespace cattle-system because it is being terminated
Error from server (Forbidden): error when creating "https://example.com/v3/import/jw9wp6bl4t4f6mjdjt5trmkxcn7g7vn5r6sslvg9zz9lkt785lfl27_c-8pvmt.yaml": secrets "cattle-credentials-ff9638e" is forbidden: unable to create new content in namespace cattle-system because it is being terminated
Error from server (Forbidden): error when creating "https://example.com/v3/import/jw9wp6bl4t4f6mjdjt5trmkxcn7g7vn5r6sslvg9zz9lkt785lfl27_c-8pvmt.yaml": deployments.apps "cattle-cluster-agent" is forbidden: unable to create new content in namespace cattle-system because it is being terminated

@superseb
Copy link
Author

superseb commented Jul 5, 2021

This is a different issue, I assume the cattle-system namespace is in Terminating as the error suggests but why? Did you try removing it manually? You probably need to edit/remove the finalizers for it to complete so this action can be run successfully.

@teopost
Copy link

teopost commented Dec 17, 2021

I accidentally deleted the pod cattle-cluster-agent

kubectl delete pod cattle-cluster-agent-56f9c876b9-8xmt2  -n cattle-system

I tried to recreate them with the commands above, which I reproduce below

# Rancher URL
export RANCHERURL="https://rancher.sacchi.lan"

# Set cluster ID
export CLUSTERID="c-rf7lf"

# Set token
export TOKEN="token-6sxh5:l87p5chllwjngczgsvbprstlc79zg8bg8z7p7hj48hjyv5xbddh75pl"

# Self signed certificates
curl -s -k -H "Authorization: Bearer ${TOKEN}" "${RANCHERURL}/v3/clusterregistrationtokens?clusterId=${CLUSTERID}" | jq -r '.data[] | select(.name != "system") | .insecureCommand'
curl --insecure -sfL https://rancher.sacchi.lan/v3/import/s48t7jzdmftdscc5vlw2mscvggtd7r7rb6m6p2qr98srdqtq98njsj.yaml | kubectl apply -f -

# on the controlplane
docker run --rm --net=host -v $(docker inspect kubelet --format '{{ range .Mounts }}{{ if eq .Destination "/etc/kubernetes" }}{{ .Source }}{{ end }}{{ end }}')/ssl:/etc/kubernetes/ssl:ro --entrypoint bash $(docker inspect $(docker images -q --filter=label=io.cattle.agent=true) --format='{{index .RepoTags 0}}' | tail -1) -c 'kubectl --kubeconfig /etc/kubernetes/ssl/kubecfg-kube-node.yaml get configmap -n kube-system full-cluster-state -o json | jq -r .data.\"full-cluster-state\" | jq -r .currentState.certificatesBundle.\"kube-admin\".config | sed -e "/^[[:space:]]*server:/ s_:.*_: \"https://127.0.0.1:6443\"_"' > kubeconfig_admin.yaml

docker run --rm --net=host -v $PWD/kubeconfig_admin.yaml:/root/.kube/config --entrypoint bash $(docker inspect $(docker images -q --filter=label=io.cattle.agent=true) --format='{{index .RepoTags 0}}' | tail -1) -c 'curl --insecure -sfL https://rancher.sacchi.lan/v3/import/s48t7jzdmftdscc5vlw2mscvggtd7r7rb6m6p2qr98srdqtq98njsj.yaml | kubectl apply -f -'

clusterrole.rbac.authorization.k8s.io/proxy-clusterrole-kubeapiserver unchanged
clusterrolebinding.rbac.authorization.k8s.io/proxy-role-binding-kubernetes-master unchanged
namespace/cattle-system unchanged
serviceaccount/cattle unchanged
clusterrolebinding.rbac.authorization.k8s.io/cattle-admin-binding unchanged
secret/cattle-credentials-72ce139 unchanged
clusterrole.rbac.authorization.k8s.io/cattle-admin unchanged
deployment.extensions/cattle-cluster-agent unchanged
daemonset.extensions/cattle-node-agent unchanged

But both cattle-cluster-agent and cattle-node-agent are not created.

[root@webmasksvi] # kubectl --kubeconfig=$PWD/kubeconfig_admin.yaml get pods -n cattle-system
No resources found in cattle-system namespace.

Could you help me?
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment