Skip to content

Instantly share code, notes, and snippets.

@dougbtv
Last active July 24, 2019 19:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dougbtv/b2dc698cbe7b75d4cea3e7c148e599d3 to your computer and use it in GitHub Desktop.
Save dougbtv/b2dc698cbe7b75d4cea3e7c148e599d3 to your computer and use it in GitHub Desktop.
Multus investigation for BZ 1732598

Replicate the issue.

Essentially here, I go ahead and do the Multus quick start, and I see that there's no additional interface...

$ ./oc get nodes
NAME                                         STATUS   ROLES    AGE     VERSION
ip-10-0-128-219.us-west-2.compute.internal   Ready    master   16m     v1.14.0+b3e2c8a2b
ip-10-0-136-69.us-west-2.compute.internal    Ready    worker   8m15s   v1.14.0+b3e2c8a2b
ip-10-0-144-195.us-west-2.compute.internal   Ready    master   16m     v1.14.0+b3e2c8a2b
ip-10-0-151-105.us-west-2.compute.internal   Ready    worker   8m24s   v1.14.0+b3e2c8a2b
ip-10-0-160-7.us-west-2.compute.internal     Ready    master   16m     v1.14.0+b3e2c8a2b
ip-10-0-169-1.us-west-2.compute.internal     Ready    worker   8m3s    v1.14.0+b3e2c8a2b
$ ./oc new-project doug
Now using project "doug" on server "https://api.debugmultus5.highfive.delivery:6443".

You can add applications to this project with the 'new-app' command. For example, try:

    oc new-app django-psql-example

to build a new example application in Python. Or use kubectl to deploy a simple Kubernetes application:

    kubectl create deployment hello-node --image=gcr.io/hello-minikube-zero-install/hello-node

$ ./oc project
Using project "doug" on server "https://api.debugmultus5.highfive.delivery:6443".
$ cat <<EOF | ./oc create -f -
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: macvlan-conf
spec:
  config: '{
      "cniVersion": "0.3.0",
      "type": "macvlan",
      "master": "eth0",
      "mode": "bridge",
      "ipam": {
        "type": "host-local",
        "subnet": "192.168.1.0/24",
        "rangeStart": "192.168.1.200",
        "rangeEnd": "192.168.1.216",
        "routes": [
          { "dst": "0.0.0.0/0" }
        ],
        "gateway": "192.168.1.1"
      }
    }'
EOF
networkattachmentdefinition.k8s.cni.cncf.io/macvlan-conf created
$ ./oc get networkattachmentdefinition.k8s.cni.cncf.io
NAME           AGE
macvlan-conf   11s
$ cat <<EOF | ./oc create -f -
apiVersion: v1
kind: Pod
metadata:
  name: samplepod
  annotations:
    k8s.v1.cni.cncf.io/networks: macvlan-conf
spec:
  containers:
  - name: samplepod
    command: ["/bin/bash", "-c", "trap : TERM INT; sleep infinity & wait"]
    image: dougbtv/centos-network
EOF
pod/samplepod created

$ watch -n1 ./oc get pods -o wide

$ ./oc get pods -o wide
NAME        READY   STATUS    RESTARTS   AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES
samplepod   1/1     Running   0          38s   10.131.0.14   ip-10-0-151-105.us-west-2.compute.internal   <none>           <none>

$ ./oc exec -it samplepod -- ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
3: eth0@if19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8951 qdisc noqueue state UP 
    link/ether 0a:58:0a:83:00:0e brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.131.0.14/23 brd 10.131.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::e4a8:2fff:fed6:76e8/64 scope link 
       valid_lft forever preferred_lft forever

Check the Multus entrypoint logs...

Figure out the name of the multus instance running on the same host where the pod ran...

$ ./oc project openshift-multus
Now using project "openshift-multus" on server "https://api.debugmultus5.highfive.delivery:6443".
$ ./oc get pods -o wide
NAME                                READY   STATUS    RESTARTS   AGE   IP             NODE                                         NOMINATED NODE   READINESS GATES
multus-8lhsk                        1/1     Running   0          20m   10.0.169.1     ip-10-0-169-1.us-west-2.compute.internal     <none>           <none>
multus-admission-controller-2wlmz   1/1     Running   0          29m   10.130.0.8     ip-10-0-160-7.us-west-2.compute.internal     <none>           <none>
multus-admission-controller-dhsp9   1/1     Running   0          29m   10.128.0.6     ip-10-0-128-219.us-west-2.compute.internal   <none>           <none>
multus-admission-controller-xb5cm   1/1     Running   0          29m   10.129.0.14    ip-10-0-144-195.us-west-2.compute.internal   <none>           <none>
multus-dgrnb                        1/1     Running   0          21m   10.0.136.69    ip-10-0-136-69.us-west-2.compute.internal    <none>           <none>
multus-jvtvv                        1/1     Running   0          29m   10.0.128.219   ip-10-0-128-219.us-west-2.compute.internal   <none>           <none>
multus-krswd                        1/1     Running   0          29m   10.0.144.195   ip-10-0-144-195.us-west-2.compute.internal   <none>           <none>
multus-kt2rc                        1/1     Running   0          21m   10.0.151.105   ip-10-0-151-105.us-west-2.compute.internal   <none>           <none>
multus-rbm6g                        1/1     Running   0          29m   10.0.160.7     ip-10-0-160-7.us-west-2.compute.internal     <none>           <none>

Get the logs for it.

$ ./oc logs multus-kt2rc
2019-07-24T16:51:21+0000 Generating Multus configuration file ...
2019-07-24T16:51:21+0000 Config file created @ /host/etc/cni/net.d/00-multus.conf
{ "name": "multus-cni-network", "type": "multus", "namespaceIsolation": true, "logLevel": "verbose", "kubeconfig": "/etc/kubernetes/cni/net.d/multus.d/multus.kubeconfig", "delegates": [ { "cniVersion": "0.3.1", "name": "openshift-sdn", "type": "openshift-sdn" } ] }
2019-07-24T16:51:21+0000 Entering sleep... (success)

Looks fine, appears to have written its configs.

Now debug the node...

Check out what's in the CNI dirs...

$ ./oc debug node/ip-10-0-151-105.us-west-2.compute.internal --image=busybox
Starting pod/ip-10-0-151-105us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.151.105
If you don't see a command prompt, try pressing enter.
/ # cd /host/
/host # ls etc/cni/net.d/
100-crio-bridge.conf       200-loopback.conf          87-podman-bridge.conflist
/host # ls etc/kubernetes/cni/net.d/
00-multus.conf             80-openshift-network.conf  multus.d

Edit the Multus log to add some extra debugging...

/host # cat etc/kubernetes/cni/net.d/00-multus.conf 
{ "name": "multus-cni-network", "type": "multus", "namespaceIsolation": true, "logFile": "/var/log/multus.log", "logLevel": "debug", "kubeconfig": "/etc/kubernetes/cni/net.d/multus.d/multus.kubeconfig", "delegates": [ { "cniVersion": "0.3.1", "name": "openshift-sdn", "type": "openshift-sdn" } ] }
/host # ls var/log/multus.log
ls: var/log/multus.log: No such file or directory

See what the Kubelet args are...

/host # ps ax | grep -i kubelet
 1099 root      1:53 /usr/bin/hyperkube kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=rhcos --minimum-container-ttl-duration=6m0s --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --cloud-provider=aws --v=3
 6995 root      0:00 grep -i kubelet

Dan mentions that it's CRIO that cares, not the Kubelet, check the CRIO dirs...

/host # cat etc/crio/crio.conf | grep "_dir ="
container_exits_dir = "/var/run/crio/exits"
container_attach_socket_dir = "/var/run/crio"
network_dir = "/etc/kubernetes/cni/net.d/"
plugin_dir = "/var/lib/cni/bin"

Those appear to be correct.

CRIO logs are really quiet...

/host # chroot /host
[root@ip-10-0-151-105 /]# journalctl -u crio
-- Logs begin at Wed 2019-07-24 16:46:55 UTC, end at Wed 2019-07-24 18:25:07 UTC. --
Jul 24 16:48:22 ip-10-0-151-105 systemd[1]: Starting Open Container Initiative Daemon...
Jul 24 16:48:29 ip-10-0-151-105 systemd[1]: Started Open Container Initiative Daemon.
Jul 24 16:49:30 ip-10-0-151-105 systemd[1]: Stopping Open Container Initiative Daemon...
Jul 24 16:49:30 ip-10-0-151-105 systemd[1]: Stopped Open Container Initiative Daemon.
Jul 24 16:49:30 ip-10-0-151-105 systemd[1]: crio.service: Consumed 339ms CPU time
-- Reboot --
Jul 24 16:50:16 ip-10-0-151-105 systemd[1]: Starting Open Container Initiative Daemon...
Jul 24 16:50:16 ip-10-0-151-105 systemd[1]: Started Open Container Initiative Daemon.

Enabled CRIO debugging logs

edited vi /etc/crio/crio.conf and then changed the logging level...

[root@ip-10-0-151-105 /]# cat /etc/crio/crio.conf | grep log_level
# log_level changes the verbosity of the logs printed.
log_level = "debug"

Then I rebooted the node and then looked through some logs.

Found this snippet...

Jul 24 18:46:09 ip-10-0-151-105 crio[1048]: time="2019-07-24 18:46:09.873557757Z" level=info msg="Got pod network &{Name:samplepod Namespace:doug ID:02fc2cea9a1aafea6ba30caed2c743dc973b2bd1ecca8e7b1c46c08b99ae39a9 NetNS:/proc/9158/ns/net PortMappings:[] Networks:[] NetworkConfig:map[]}"
Jul 24 18:46:09 ip-10-0-151-105 crio[1048]: time="2019-07-24 18:46:09.873579553Z" level=info msg="About to add CNI network cni-loopback (type=loopback)"
Jul 24 18:46:09 ip-10-0-151-105 crio[1048]: time="2019-07-24 18:46:09.876138171Z" level=info msg="Got pod network &{Name:samplepod Namespace:doug ID:02fc2cea9a1aafea6ba30caed2c743dc973b2bd1ecca8e7b1c46c08b99ae39a9 NetNS:/proc/9158/ns/net PortMappings:[] Networks:[] NetworkConfig:map[]}"
Jul 24 18:46:09 ip-10-0-151-105 crio[1048]: time="2019-07-24 18:46:09.876157773Z" level=info msg="About to add CNI network openshift-sdn (type=openshift-sdn)"

May be related this CRI-O issue Casey filed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment