Skip to content

Instantly share code, notes, and snippets.

@naioja
Last active August 31, 2022 09:53
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save naioja/eb8bac307a711e704b7923400b10bc14 to your computer and use it in GitHub Desktop.
Save naioja/eb8bac307a711e704b7923400b10bc14 to your computer and use it in GitHub Desktop.
Problem description
On the latest update of systemd for Ubuntu 18.04 the underlying OS for AKS the following bug surfaced : https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1988119
The affected version being : 237-3ubuntu10.54
Below are some ALTERNATIVE suggestions how to fix it, besides the official response from Microsoft : https://status.azure.com/en-us/status
1. Fixing it with the az vmss run-command:
```
AKS_MANAGED_GROUP="MC_rg-monitor001_aks001_eastus2"
VMSS_NAME="aks-sysnp001-42513286-vmss"
VMSS_INSTANCE_ID="0"
az vmss run-command invoke\
-g $AKS_MANAGED_GROUP \
-n $VMSS_NAME\
--command-id RunShellScript \
--instance-id $VMSS_INSTANCE_ID \
--scripts "echo 'FallbackDNS=168.63.129.16' >> /etc/systemd/resolved.conf && /systemctl restart systemd-resolved"
```
Then cycle thru the instances IDs in your VMSS, once all are done, move to your next VMSS, equivalent to the next node pool.
2. If you have enabled SSH access enabled on your AKS cluster and you have a large number nodes, do the fix with ansible could be a valid alternative too:
---
- name: Ansible validate if the packages are installed
hosts: AKS
become: true
become_method: sudo
become_user: azureuser
tasks:
- name: "Register systemd package version"
command: dpkg-query --showformat='${Version}' --show systemd
register: systemd_package_version
- name: Check whether /etc/systemd/resolved.conf contains "FallbackDNS=168.63.129.16"
command: grep -Fxq "FallbackDNS=168.63.129.16" /etc/systemd/resolved.conf
register: checkmyconf
check_mode: no
ignore_errors: yes
changed_when: no
- name: add fix to /etc/systemd/resolved.conf
lineinfile:
dest: /etc/systemd/resolved.conf
line: "FallbackDNS=168.63.129.16"
when: (checkmyconf.rc == 1) and (systemd_package_version.stdout == "237-3ubuntu10.54")
#ansible_hosts
[AKS]
10.y.y.y
10.x.x.x
ansible-playbook fix.yml -i ansible_hosts
3. If you have just a few nodes in your cluster you can just run a privileged container on each node and the fix manually:
```
kubectl debug node/aks-NODENAME-HERE!!!!! -it --image=mcr.microsoft.com/dotnet/runtime-deps:6.0
````
Once logged it just mount the host OS by using chroot
```
chroot /host
```
You can confirm that you are affected by the bug if the following command ```dpkg-query --showformat='${Version}' --show systemd```
that returns `237-3ubuntu10.54`
Then simply ```echo 'FallbackDNS=168.63.129.16' >> /etc/systemd/resolved.conf``` and execute a restart ```systemctl restart systemd-resolved```
4. Create a daemonset and add the fallback dns entry in the config file and restart the service:
# systemd-fix-daemonset.yml
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: systemd-fix-daemonset
namespace: kube-system
spec:
selector:
matchLabels:
job: systemd-fix-daemonset
template:
metadata:
labels:
job: systemd-fix-daemonset
spec:
tolerations:
- key: CriticalAddonsOnly
operator: Exists
effect: NoSchedule
- key: WORKLOAD
operator: Exists
effect: NoSchedule
volumes:
- name: hostfs
hostPath:
path: /
hostPID: true
restartPolicy: Always
nodeSelector:
"kubernetes.io/os": linux
initContainers:
- name: init
image: alpine
command:
- /bin/sh
- -xc
- |
chroot /host \
/bin/grep -v ^# /host/etc/systemd/resolved.conf | /bin/grep -qxF 'FallbackDNS=168.63.129.16' /host/etc/systemd/resolved.conf || echo 'FallbackDNS=168.63.129.16' >> /host/etc/systemd/resolved.conf && chroot /host /bin/systemctl restart systemd-resolved
volumeMounts:
- name: hostfs
mountPath: /host
containers:
- name: sleep
image: kubernetes/pause
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment