Skip to content

Instantly share code, notes, and snippets.

@zaki-lknr
Last active July 16, 2020 01:22
Show Gist options
  • Save zaki-lknr/b9c1d4dd29ab29eae55165486bd672fe to your computer and use it in GitHub Desktop.
Save zaki-lknr/b9c1d4dd29ab29eae55165486bd672fe to your computer and use it in GitHub Desktop.
OKD4のインストール作業ログ

試したパターン

OKD4 GA!!! (OKD 4.5)

FCOS stream OKD result comment
32.20200629.3.0 stable 4.5.0-0.okd-2020-07-14-153706-ga 7/16: 〇 rebootも不要
31.20200118.3.0 stable 4.4.0-0.okd-2020-01-28-022517 (preview 2) 2/5: x
31.20200118.3.0 stable 4.4.0-0.okd-2020-02-05-224417 2/6: x
31.20200127.2.0 testing 4.4.0-0.okd-2020-01-28-022517 (preview 2) 2/7: x, 2/8 〇 bootstrapが上がり切った後に一度rebootする!
31.20200127.2.0 testing 4.4.0-0.okd-2020-02-05-224417 試してない
31.20191217.2.0 testing 4.4.0-0.okd-2020-01-28-022517 (preview 2) 2/6:x, 2/7:x OSインストール処理が完了してなさそう…(sshも入れない)

昨日(2/5)うまくいかなかったパターン

FCOS

31.20200118.3.0 / stable / PXE版

Download Fedora CoreOS

OKD

Preview2 になっている 4.4.0-0.okd-2020-01-28-022517

Release 4.4.0-0.okd-2020-01-28-022517

config

apiVersion: v1
baseDomain: naru.jp-z.jp
compute:
- hyperthreading: Enabled
  name: worker
  replicas: 0
controlPlane:
  hyperthreading: Enabled
  name: master
  replicas: 3
metadata:
  name: okd4
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  none: {}
pullSecret: ...
sshKey: ...

Installing CoreOS on Bare Metal :: Fedora Docs Site

PXEブート用

DEFAULT pxeboot
TIMEOUT 20
PROMPT 0
LABEL pxeboot
    KERNEL fedora-coreos-31.20200118.3.0-live-kernel-x86_64
    APPEND ip=dhcp rd.neednet=1 initrd=fedora-coreos-31.20200118.3.0-live-initramfs.x86_64.img console=tty0 console=ttyS0 coreos.inst.install_dev=sda coreos.inst.stream=stable coreos.inst.ignition_url=http://192.168.0.19/cos/bootstrap.ign coreos.inst.image_url=http://192.168.0.19/cos/fedora-coreos-31.20200118.3.0-metal.x86_64.raw.xz
IPAPPEND 2

fedora-coreos-31.20200118.3.0-metal.x86_64.raw.xzファイルと同じパスにsig`ファイルも忘れずに。

これでFCOSのOS自体は起動するが、masterノードでetcdが起動しないなど、プロセスが足りない感じ。


$ ./openshift-install create manifests --dir=bare-metal
--- cluster-scheduler-02-config.yml     2020-02-02 21:00:42.267823716 +0900
+++ bare-metal/manifests/cluster-scheduler-02-config.yml        2020-02-02 21:00:51.442966340 +0900
@@ -4,7 +4,7 @@
   creationTimestamp: null
   name: cluster
 spec:
-  mastersSchedulable: true
+  mastersSchedulable: false
   policy:
     name: ""
 status: {}

この手順を入れると、OSの起動時にemergency.serviceがFailed to set up standard input: Inappropriate ioctl for deviceとかMain process exited, code=exited, status=208/STDINとか出力して起動しない。

今(18:54時点)でAcceptedになっている以下で。
Release 4.4.0-0.okd-2020-02-05-224417

FCOSは同じやつ。

install-config

$ mkdir okd4.4-2020-02-05-185904
$ cd okd4.4-2020-02-05-185904
$ oc adm release extract --tools registry.svc.ci.openshift.org/origin/release:4.4.0-0.okd-2020-02-05-224417

あ、timeすればよかった。でも1分しないくらい

$ ll
合計 106672
-rwxr-xr-x. 1 zaki zaki 25139577  2月  6 00:44 openshift-client-linux-4.4.0-0.okd-2020-02-05-224417.tar.gz
-rwxr-xr-x. 1 zaki zaki 84058416  2月  4 08:31 openshift-install-linux-4.4.0-0.okd-2020-02-05-224417.tar.gz
-rw-r--r--. 1 zaki zaki    20580  2月  6 18:55 release.txt
-rw-r--r--. 1 zaki zaki      331  2月  6 18:55 sha256sum.txt

installerを取得

$ mkdir bare-metal
$ cp ../install-config.yaml bare-metal/
$ tar xf openshift-install-linux-4.4.0-0.okd-2020-02-05-224417.tar.gz 
$ ll openshift-install
-rwxr-xr-x. 1 zaki zaki 332165120  2月  4 08:31 openshift-install

ignitionファイル作成

$ ./openshift-install create ignition-configs --dir=bare-metal
INFO Consuming Install Config from target directory 
WARNING Making control-plane schedulable by setting MastersSchedulable to true for Scheduler cluster settings 
$ find bare-metal/
bare-metal/
bare-metal/.openshift_install.log
bare-metal/.openshift_install_state.json
bare-metal/auth
bare-metal/auth/kubeconfig
bare-metal/auth/kubeadmin-password
bare-metal/master.ign
bare-metal/worker.ign
bare-metal/bootstrap.ign
bare-metal/metadata.json

*.ignをhttpサーバへ配置

$ ll bare-metal/*ign
-rw-r-----. 1 zaki zaki 297062  2月  6 19:01 bare-metal/bootstrap.ign
-rw-r-----. 1 zaki zaki   1851  2月  6 19:01 bare-metal/master.ign
-rw-r-----. 1 zaki zaki   1851  2月  6 19:01 bare-metal/worker.ign
$ scp bare-metal/*.ign 192.168.0.19:.

webサーバ

# ll /var/www/html/cos/*ign
-rw-r--r--. 1 root root 297062  2月  6 19:03 /var/www/html/cos/bootstrap.ign
-rw-r--r--. 1 root root   1851  2月  6 19:03 /var/www/html/cos/master.ign
-rw-r--r--. 1 root root   1851  2月  6 19:03 /var/www/html/cos/worker.ign

start

bootstrapノードのディスクを作り直して電源on

==> /var/log/messages <==
Feb  6 19:06:04 manager dnsmasq-dhcp[24758]: DHCPDISCOVER(ens224) 00:0c:29:a6:cc:12
Feb  6 19:06:04 manager dnsmasq-dhcp[24758]: DHCPOFFER(ens224) 172.16.0.51 00:0c:29:a6:cc:12
Feb  6 19:06:06 manager dnsmasq-dhcp[24758]: DHCPREQUEST(ens224) 172.16.0.51 00:0c:29:a6:cc:12
Feb  6 19:06:06 manager dnsmasq-dhcp[24758]: DHCPACK(ens224) 172.16.0.51 00:0c:29:a6:cc:12 okd4-bootstrap
Feb  6 19:06:06 manager dnsmasq-tftp[24758]: error 0 TFTP Aborted received from 172.16.0.51
Feb  6 19:06:06 manager dnsmasq-tftp[24758]: failed sending /var/lib/tftpboot/pxelinux.0 to 172.16.0.51
Feb  6 19:06:06 manager dnsmasq-tftp[24758]: sent /var/lib/tftpboot/pxelinux.0 to 172.16.0.51
Feb  6 19:06:07 manager dnsmasq-tftp[24758]: file /var/lib/tftpboot/pxelinux.cfg/564d83dc-ea1b-2ab3-6525-7f62fea6cc12 not found
Feb  6 19:06:07 manager dnsmasq-tftp[24758]: sent /var/lib/tftpboot/pxelinux.cfg/01-00-0c-29-a6-cc-12 to 172.16.0.51
Feb  6 19:06:07 manager dnsmasq-tftp[24758]: sent /var/lib/tftpboot/fedora-coreos-31.20200118.3.0-live-kernel-x86_64 to 172.16.0.51

起動

==> /var/log/httpd/access_log <==
172.16.0.51 - - [06/Feb/2020:19:08:02 +0900] "GET /cos/bootstrap.ign HTTP/1.1" 200 297062 "-" "curl/7.66.0"
172.16.0.51 - - [06/Feb/2020:19:08:02 +0900] "GET /cos/fedora-coreos-31.20200118.3.0-metal.x86_64.raw.xz.sig HTTP/1.1" 200 543 "-" "reqwest/0.9.24"
172.16.0.51 - - [06/Feb/2020:19:08:02 +0900] "GET /cos/fedora-coreos-31.20200118.3.0-metal.x86_64.raw.xz HTTP/1.1" 200 455336040 "-" "reqwest/0.9.24"

==> /var/log/messages <==
Feb  6 19:08:48 manager dnsmasq-dhcp[24758]: DHCPDISCOVER(ens224) 00:0c:29:a6:cc:12
Feb  6 19:08:48 manager dnsmasq-dhcp[24758]: DHCPOFFER(ens224) 172.16.0.51 00:0c:29:a6:cc:12
Feb  6 19:08:48 manager dnsmasq-dhcp[24758]: DHCPREQUEST(ens224) 172.16.0.51 00:0c:29:a6:cc:12
Feb  6 19:08:48 manager dnsmasq-dhcp[24758]: DHCPACK(ens224) 172.16.0.51 00:0c:29:a6:cc:12 okd4-bootstrap
Feb  6 19:08:55 manager dnsmasq-dhcp[24758]: DHCPDISCOVER(ens224) 00:0c:29:a6:cc:12
Feb  6 19:08:55 manager dnsmasq-dhcp[24758]: DHCPOFFER(ens224) 172.16.0.51 00:0c:29:a6:cc:12
Feb  6 19:08:55 manager dnsmasq-dhcp[24758]: DHCPREQUEST(ens224) 172.16.0.51 00:0c:29:a6:cc:12
Feb  6 19:08:55 manager dnsmasq-dhcp[24758]: DHCPACK(ens224) 172.16.0.51 00:0c:29:a6:cc:12 okd4-bootstrap

こうなると

2020-02-06_19h09_16

処理が始まるんでさらに待つ

ssh

適当にタイミング見計らって

$ ssh core@okd4-bootstrap
The authenticity of host 'okd4-bootstrap (172.16.0.51)' can't be established.
ECDSA key fingerprint is SHA256:Sx9FIkvj3+m3pK8izr8k16EN+MJPwzUkcLDI/4xFgbs.
ECDSA key fingerprint is MD5:5d:eb:5b:db:36:1d:28:a1:f0:8a:f0:97:69:15:15:a2.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'okd4-bootstrap,172.16.0.51' (ECDSA) to the list of known hosts.
This is the bootstrap node; it will be destroyed when the master is fully up.

The primary service is "bootkube.service". To watch its status, run e.g.

  journalctl -b -f -u bootkube.service
This is the bootstrap node; it will be destroyed when the master is fully up.

The primary service is "bootkube.service". To watch its status, run e.g.

  journalctl -b -f -u bootkube.service
Fedora CoreOS 31.20200203.20.0
Tracker: https://github.com/coreos/fedora-coreos-tracker

[core@okd4-bootstrap ~]$

うむ

[core@okd4-bootstrap ~]$ sudo ss -anpt | grep -i listen
LISTEN    0      4096              127.0.0.1:46437                0.0.0.0:*      users:(("crio",pid=4322,fd=12))                                                
LISTEN    0      4096              127.0.0.1:10248                0.0.0.0:*      users:(("kubelet",pid=4338,fd=26))                                             
LISTEN    0      128                 0.0.0.0:59439                0.0.0.0:*      users:(("rpc.statd",pid=869,fd=8))                                             
LISTEN    0      128                 0.0.0.0:111                  0.0.0.0:*      users:(("rpcbind",pid=862,fd=4),("systemd",pid=1,fd=42))                       
LISTEN    0      128                 0.0.0.0:22                   0.0.0.0:*      users:(("sshd",pid=800,fd=4))                                                  
LISTEN    0      4096                      *:10250                      *:*      users:(("kubelet",pid=4338,fd=28))                                             
LISTEN    0      4096                      *:6443                       *:*      users:(("kube-etcd-signe",pid=4077,fd=3))                                      
LISTEN    0      4096                      *:10255                      *:*      users:(("kubelet",pid=4338,fd=29))                                             
LISTEN    0      128                    [::]:111                     [::]:*      users:(("rpcbind",pid=862,fd=6),("systemd",pid=1,fd=44))                       
LISTEN    0      128                    [::]:22                      [::]:*      users:(("sshd",pid=800,fd=6))                                                  
LISTEN    0      128                    [::]:60893                   [::]:*      users:(("rpc.statd",pid=869,fd=10))                                            
LISTEN    0      4096                      *:22623                      *:*      users:(("machine-config-",pid=5887,fd=3))                                      
LISTEN    0      4096                      *:22624                      *:*      users:(("machine-config-",pid=5887,fd=5))                                      
LISTEN    0      4096                      *:6080                       *:*      users:(("kube-etcd-signe",pid=4077,fd=5)) 

うむ

[core@okd4-bootstrap ~]$ journalctl -b -f -u bootkube.service
:
:

 2月 06 10:29:07 okd4-bootstrap bootkube.sh[773]: {"level":"warn","ts":"2020-02-06T10:29:07.054Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-b19a717a-310b-4d8c-ab70-3aaeab843ed1/172.16.0.51:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 172.16.0.51:2379: connect: connection refused\""}
 2月 06 10:29:07 okd4-bootstrap bootkube.sh[773]: https://172.16.0.51:2379 is unhealthy: failed to commit proposal: context deadline exceeded
 2月 06 10:29:07 okd4-bootstrap bootkube.sh[773]: Error: unhealthy cluster
 2月 06 10:29:07 okd4-bootstrap podman[18160]: 2020-02-06 10:29:07.091764495 +0000 UTC m=+5.286713808 container died 57292035acb1f92c36a9c428ebad39c1702261c0e5d1e2e35f9e29133e55112f (image=registry.svc.ci.openshift.org/origin/4.4-2020-02-05-224417@sha256:f0fa1e9c84c55121634eeb5dd992dd0958835d330515c4d0fb0508e92ef3011e, name=etcdctl)
 2月 06 10:29:07 okd4-bootstrap podman[18160]: 2020-02-06 10:29:07.124609463 +0000 UTC m=+5.319558779 container remove 57292035acb1f92c36a9c428ebad39c1702261c0e5d1e2e35f9e29133e55112f (image=registry.svc.ci.openshift.org/origin/4.4-2020-02-05-224417@sha256:f0fa1e9c84c55121634eeb5dd992dd0958835d330515c4d0fb0508e92ef3011e, name=etcdctl)
 2月 06 10:29:07 okd4-bootstrap bootkube.sh[773]: etcdctl failed. Retrying in 5 seconds...

もしかしてbootstrap上もetcdが上がっていない? (bootstrapは172.16.0.51)

使用バージョン

  • FCOS 31.20200127.2.0(testing)
  • 4.4.0-0.okd-2020-01-28-022517

ポイントは「bootstrapが上がり切ったら一度rebootする」

DEFAULT pxeboot
TIMEOUT 20
PROMPT 0
LABEL pxeboot
    KERNEL fedora-coreos-31.20200127.2.0-live-kernel-x86_64
    APPEND ip=dhcp rd.neednet=1 initrd=fedora-coreos-31.20200127.2.0-live-initramfs.x86_64.img console=tty0 console=ttyS0 coreos.inst.install_dev=sda coreos.inst.stream=testing coreos.inst.ignition_url=http://192.168.0.19/cos/bootstrap.ign coreos.inst.image_url=http://192.168.0.19/cos/fedora-coreos-31.20200127.2.0-metal.x86_64.raw.xz
IPAPPEND 2

たぶん前から解消されてたと思うけど、OKD4.5 + FCOS 32であればリブート不要(2020.07.16確認)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment