Skip to content

Instantly share code, notes, and snippets.

@praveenkumar
Created March 5, 2019 06:01
Show Gist options
  • Save praveenkumar/4f0b929593563c087bc724b75c83ee40 to your computer and use it in GitHub Desktop.
Save praveenkumar/4f0b929593563c087bc724b75c83ee40 to your computer and use it in GitHub Desktop.
[DONT DELETE] installer related gist and comments.
This gist contain the install related comments.
@praveenkumar
Copy link
Author

Running tag 0.13.0 and 0.13.1 for installer in libvirt provider is failing to get the console operator up and running and user can expect following output from the console container.

2019/03/2 10:05:03 auth: error contacting auth provider (retrying in 2m8s): request to OAuth issuer endpoint https://openshift-authentication-openshift-authentication.apps.test.tt.testing/oauth/token failed: Head https://openshift-authentication-openshift-authentication.apps.test.tt.testing: dial tcp: lookup openshift-authentication-openshift-authentication.apps.test.tt.testing on 172.30.0.10:53: no such host

Problem is now the route for openshift-authentication created which console try to consume but the entry for this route is not present on the libvirt network definition.

# virsh net-dumpxml test1-t9fq7 
<network connections='1'>
  <name>test1-t9fq7</name>
  <uuid>ddaa1c7f-9b27-4fb7-b325-623a26ae9e7e</uuid>
  <forward mode='nat'>
    <nat>
      <port start='1024' end='65535'/>
    </nat>
  </forward>
  <bridge name='tt0' stp='on' delay='0'/>
  <mac address='52:54:00:da:2f:3b'/>
  <domain name='test1.tt.testing' localOnly='yes'/>
  <dns>
    <srv service='etcd-server-ssl' protocol='tcp' domain='test1.tt.testing' target='etcd-0.test1.tt.testing' port='2380' weight='10'/>
    <host ip='192.168.126.11'>
      <hostname>api.test1.tt.testing</hostname>
      <hostname>etcd-0.test1.tt.testing</hostname>
    </host>
  </dns>
  <ip family='ipv4' address='192.168.126.1' prefix='24'>
    <dhcp>
      <host mac='52:fd:fc:07:21:82' name='test1-t9fq7-master-0' ip='192.168.126.11'/>
      <host mac='66:4f:16:3f:5f:0f' name='test1-t9fq7-bootstrap' ip='192.168.126.10'/>
      <host mac='62:73:a0:15:97:a1' name='test1-t9fq7-worker-0-z667d' ip='192.168.126.51'/>
    </dhcp>
  </ip>
</network>

So as a workaround you need to first add that entry for worker node, at this moment we are assuming worker ip is 192.168.126.51 so as soon as your network definition is created you need to update it with the entry for this dns.

$ sudo virsh net-list 
 Name                 State      Autostart     Persistent
----------------------------------------------------------
 default              active     yes           yes
 test1-t9fq7          active     yes           yes

$ sudo virsh net-update test1-t9fq7 add dns-host "<host ip='192.168.126.51'><hostname>openshift-authentication-openshift-authentication.apps.test1.tt.testing</hostname></host>"
Updated network test1-t9fq7 live state

$ sudo virsh net-dumpxml test1-t9fq7 
<network>
  <name>test1-t9fq7</name>
  <uuid>ddaa1c7f-9b27-4fb7-b325-623a26ae9e7e</uuid>
  <forward mode='nat'>
    <nat>
      <port start='1024' end='65535'/>
    </nat>
  </forward>
  <bridge name='tt0' stp='on' delay='0'/>
  <mac address='52:54:00:da:2f:3b'/>
  <domain name='test1.tt.testing' localOnly='yes'/>
  <dns>
    <srv service='etcd-server-ssl' protocol='tcp' domain='test1.tt.testing' target='etcd-0.test1.tt.testing' port='2380' weight='10'/>
    <host ip='192.168.126.10'>
      <hostname>api.test1.tt.testing</hostname>
    </host>
    <host ip='192.168.126.11'>
      <hostname>api.test1.tt.testing</hostname>
      <hostname>etcd-0.test1.tt.testing</hostname>
    </host>
    <host ip='192.168.126.51'>
      <hostname>openshift-authentication-openshift-authentication.apps.test1.tt.testing</hostname>
    </host>
  </dns>
  <ip family='ipv4' address='192.168.126.1' prefix='24'>
    <dhcp>
      <host mac='52:fd:fc:07:21:82' name='test1-t9fq7-master-0' ip='192.168.126.11'/>
      <host mac='66:4f:16:3f:5f:0f' name='test1-t9fq7-bootstrap' ip='192.168.126.10'/>
    </dhcp>
  </ip>
</network>

@praveenkumar
Copy link
Author

praveenkumar commented Mar 6, 2019

$ cat 99-master-kubelet-no-taint.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 02-master-kubelet
spec:
  config:
    systemd:
      units:
      - contents: |
          [Unit]
          Description=Kubernetes Kubelet
          Wants=rpc-statd.service

          [Service]
          Type=notify
          ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests
          ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state
          EnvironmentFile=-/etc/kubernetes/kubelet-workaround
          EnvironmentFile=-/etc/kubernetes/kubelet-env

          ExecStart=/usr/bin/hyperkube \
              kubelet \
                --config=/etc/kubernetes/kubelet.conf \
                --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
                --rotate-certificates \
                --kubeconfig=/var/lib/kubelet/kubeconfig \
                --container-runtime=remote \
                --container-runtime-endpoint=/var/run/crio/crio.sock \
                --allow-privileged \
                --node-labels=node-role.kubernetes.io/master \
                --minimum-container-ttl-duration=6m0s \
                --client-ca-file=/etc/kubernetes/ca.crt \
                --cloud-provider= \
                --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \
                \
                --anonymous-auth=false \

          Restart=always
          RestartSec=10

          [Install]
          WantedBy=multi-user.target
        enabled: true
        name: kubelet.service

Node selector for worker

node-role.kubernetes.io/worker: ""

@praveenkumar
Copy link
Author

Adding worker label to master.

INSTALL_DIR=test

# Destroy an existing cluster and resources
./openshift-install --dir $INSTALL_DIR destroy cluster --log-level debug

# Create the INSTALL_DIR for the installer and copy the install-config
rm -fr $INSTALL_DIR && mkdir $INSTALL_DIR && cp install-config.yaml $INSTALL_DIR

# Create the manifests using the INSTALL_DIR
./openshift-install --dir $INSTALL_DIR create manifests

# Copy the config which removes taint from master
cp 99_master-kubelet-no-taint.yaml $INSTALL_DIR/openshift/

# Edit $INSTALL_DIR/openshift/99_openshift-cluster-api_master-machines-0.yaml and add spec.metadata.labels[node-role.kubernetes.io/worker] ""

# Start the cluster with 10GB memory and 4 CPU create and wait till it finish
export TF_VAR_libvirt_master_memory=10192
export TF_VAR_libvirt_master_vcpu=4
./openshift-install --dir $INSTALL_DIR create cluster

# check the machine from openshift-machine-api namespace.
$ oc get machines -n openshift-machine-api 
NAME                   INSTANCE   STATE   TYPE   REGION   ZONE   AGE
test1-9k6cz-master-0                                             8m1s

$ oc get machine test1-9k6cz-master-0 -oyaml -n openshift-machine-api 
apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
  creationTimestamp: 2019-03-07T08:37:22Z
  finalizers:
  - machine.machine.openshift.io
  generation: 1
  labels:
    machine.openshift.io/cluster-api-cluster: test1-9k6cz
    machine.openshift.io/cluster-api-machine-role: master
    machine.openshift.io/cluster-api-machine-type: master
  name: test1-9k6cz-master-0
  namespace: openshift-machine-api
  resourceVersion: "4590"
  selfLink: /apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machines/test1-9k6cz-master-0
  uid: 3a2453d5-40b4-11e9-a93a-52fdfc072182
spec:
  metadata:
    creationTimestamp: null
    labels:
      node-role.kubernetes.io/worker: ""    ==> This contains the change we made before starting cluster. 
  providerSpec:
    value:
      apiVersion: libvirtproviderconfig.k8s.io/v1alpha1
      autostart: false
      cloudInit: null
      domainMemory: 4096
      domainVcpu: 2
      ignKey: ""
      ignition:
        userDataSecret: master-user-data
      kind: LibvirtMachineProviderConfig
      networkInterfaceAddress: 192.168.126.0/24
      networkInterfaceHostname: ""
      networkInterfaceName: test1-9k6cz
      networkUUID: ""
      uri: qemu+tcp://libvirt.default/system
      volume:
        baseVolumeID: /var/lib/libvirt/images/test1-9k6cz-base
        poolName: default
        volumeName: ""
  versions:
    kubelet: ""

# Check the labels on the Master (This doesn't contain the worker label)
$ oc get nodes --show-labels
NAME                   STATUS   ROLES    AGE   VERSION              LABELS
test1-9k6cz-master-0   Ready    master   21m   v1.12.4+4dd65df23d   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=test1-9k6cz-master-0,node-role.kubernetes.io/master=

# Need to add label manually
$ oc label nodes test1-9k6cz-master-0 node-role.kubernetes.io/worker=
node/test1-9k6cz-master-0 labeled

# Now master have worker label and ingress also scheduled succesfully
$ oc get nodes --show-labels
NAME                   STATUS   ROLES           AGE   VERSION              LABELS
test1-9k6cz-master-0   Ready    master,worker   22m   v1.12.4+4dd65df23d   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=test1-9k6cz-master-0,node-role.kubernetes.io/master=,node-role.kubernetes.io/worker=

@praveenkumar
Copy link
Author

How to debug for a node

$ oc debug node/<node_name>
$ chroot /host /bin/bash

Get the endpoint of a service

$ oc get svc
$ oc get ed <svc_name>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment