racadm>>racadm remoteimage -c -l 10.10.2.242:/export/slate3.img
racadm remoteimage -c -l 10.10.2.242:/export/slate3.img
Remote Image is now Configured
racadm>>racadm serveraction powercycle
racadm serveraction powercycle
Server power operation initiated successfully
/admin1-> console com2
Connected to Serial Device 2. To end type: ^\
now it is mounted, and BIOS sees it:
Booting from Virtual Floppy Drive
iPXE initialising devices...ok
iPXE 1.0.0+ (af18) -- Open Source Network Boot Firmware -- http://ipxe.org
Features: DNS HTTP iSCSI TFTP SRP VLAN AoE EFI Menu
SLATE NetBoot v0.30
I hit 'm' to go to the iPXE SHell. Otherwise it will try to boot with DHCP
then you'll see:
SLATE NetBoot Failsafe Menu
Boot to local drive
Manual network configuration
Retry boot
iPXE Debug Shell
Reboot System
I go to 'manual network configuration' and plug in values as appropriate:
Network Configuration:
Available interfaces...
net0: 24:6e:96:c6:43:94 using i350 on 0000:01:00.0 (closed)
[Link:up, TX:0 TXE:0 RX:0 RXE:0]
net1: 24:6e:96:c6:43:95 using i350 on 0000:01:00.1 (closed)
[Link:up, TX:0 TXE:0 RX:0 RXE:0]
net2: 24:6e:96:c6:43:90 using 82599-sfp on 0000:18:00.0 (closed)
[Link:down, TX:0 TXE:0 RX:0 RXE:0]
[Link status: Down (http://ipxe.org/38086193)]
net3: 24:6e:96:c6:43:92 using 82599-sfp on 0000:18:00.1 (closed)
[Link:down, TX:0 TXE:0 RX:0 RXE:0]
[Link status: Down (http://ipxe.org/38086193)]
Set network interface number [0 for net0, defaults to 0]: 1
IP:192.41.231.235
Subnet mask:255.255.254.0
Gateway:192.41.230.1
DNS:8.8.8.8
You will then see:
Attempting chainload of SLATE Image Repo...
Could not start download: Operation not supported (http://ipxe.org/3c092083)
HTTPS appears to have failed... attempting HTTP
http://192.170.227.197/~lincolnb/slate.ipxe... ok
Which will take you to this menu:
SLATE Image Selection Menu
PerfSONAR Installer
SLATE CoreOS Edge Service
SLATE Management Node (EL7)
Configuration
Enable serial console
SPECIAL
SLATE Edge Service Node (UMich)
I go to "Enable serial console":
Configure console...
Console (default tty0):ttyS0
Baud rate (default none):115200
Then I go back and pick the "UMich" entry from the SPECIAL tab.
it chugged for a while becasue I didn't have the nginx
server started on sl-um-oob1:
[** ] A start job is running for Ignition (disks) (11s / no limit)[ 18.084952] ignition[593]: GET http://10.10.2.242/slate.ign: attempt #7
[ 18.093198] ignition[593]: GET error: Get http://10.10.2.242/slate.ign: dial tcp 10.10.2.242:80: connect: connection refused
[ *** ] A start job is running for Ignition (disks) (12s / no limit)[ 19.283867] systemd-networkd[549]: eth2: DHCPv4 address 192.41.231.235/23 via 192.41.230.1
[ 19.293217] systemd-networkd[549]: Not connected to system bus, not setting hostname.
[ *** ] A start job is running for Ignition (disks) (16s / no limit)[ 23.085488] ignition[593]: GET http://10.10.2.242/slate.ign: attempt #8
[ 23.094289] ignition[593]: GET error: Get http://10.10.2.242/slate.ign: dial tcp 10.10.2.242:80: connect: connection refused
[** ] A start job is running for Ignition (disks) (18s / no limit)[ 25.161800] systemd-networkd[549]: eth0: Configured
[ 25.225418] systemd-networkd[549]: eth2: Configured
[ *** ] A start job is running for Ignition (disks) (21s / no limit)[ 28.086164] ignition[593]: GET http://10.10.2.242/slate.ign: attempt #9
[ 28.094179] ignition[593]: GET error: Get http://10.10.2.242/slate.ign: dial tcp 10.10.2.242:80: connect: connection refused
[** ] A start job is running for Ignition (disks) (26s / no limit)[ 33.086559] ignition[593]: GET http://10.10.2.242/slate.ign: attempt #10
[ 33.095152] ignition[593]: GET error: Get http://10.10.2.242/slate.ign: dial tcp 10.10.2.242:80: connect: connection refused
[ *] A start job is running for Ignition (disks) (31s / no limit)[ 38.087104] ignition[593]: GET http://10.10.2.242/slate.ign: attempt #11
[ 38.095321] ignition[593]: GET error: Get http://10.10.2.242/slate.ign: dial tcp 10.10.2.242:80: connect: connection refused
[*** ] A start job is running for Ignition (disks) (36s / no limit)[ 43.087648] ignition[593]: GET http://10.10.2.242/slate.ign: attempt #12
[ 43.096227] ignition[593]: GET error: Get http://10.10.2.242/slate.ign: dial tcp 10.10.2.242:80: connect: connection refused
[ *** ] A start job is running for Ignition (disks) (41s / no limit)[ 48.088276] ignition[593]: GET http://10.10.2.242/slate.ign: attempt #13
[ 48.096271] ignition[593]: GET error: Get http://10.10.2.242/slate.ign: dial tcp 10.10.2.242:80: connect: connection refused
[ *] A start job is running for Ignition (disks) (46s / no limit)[ 53.089042] ignition[593]: GET http://10.10.2.242/slate.ign: attempt #14
[ 53.097104] ignition[593]: GET result: OK
once I started nginx, it downlaoded the next stage of the boot process:
This is localhost (Linux x86_64 4.14.67-coreos) 20:16:08
SSH host key: SHA256:TtlvKUfFWuO2yhn9qvO2O/2VB8nV/ioAX/BFDlxomp0 (ED25519)
SSH host key: SHA256:5+DTuYpdcxO3LFE1cGpkQW/64uxrBDZ1RBw7+d2uH10 (ECDSA)
SSH host key: SHA256:ASVq1/Ykco+VUPykbA7hf8elhG43wK1gDa7T9zcy9B0 (DSA)
SSH host key: SHA256:hcqDAd9zm8ps3o0y4lFA1Am/UXiERF3C0SXNsrcXU6I (RSA)
eno1:
eno2:
eno3:
eno4:
idrac: 169.254.1.2 fe80::d294:66ff:fe5f:f278
localhost login: core (automatic login)
Last login: Thu Oct 4 20:16:08 UTC 2018 on tty1
Container Linux by CoreOS stable (1855.4.0)
Update Strategy: No Reboots
core@localhost ~ $ [ 62.503315] igb 0000:01:00.0 eno3: igb: eno3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ 62.512519] IPv6: ADDRCONF(NETDEV_CHANGE): eno3: link becomes ready
[ 62.764311] igb 0000:01:00.1 eno4: igb: eno4 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 62.773787] IPv6: ADDRCONF(NETDEV_CHANGE): eno4: link becomes ready
core@localhost ~ $
core@localhost ~ $
it automatically dhcp'd on eno3 as well (10.x) but I turned that interface off just incase:
root@sl-um-es1 network # ip link set dev eno3 down
next thing to do is follow the items from this guide:
https://kubernetes.io/docs/setup/independent/install-kubeadm/
Specifically:
CNI_VERSION="v0.6.0"
mkdir -p /opt/cni/bin
curl -L "https://github.com/containernetworking/plugins/releases/download/${CNI_VERSION}/cni-plugins-amd64-${CNI_VERSION}.tgz" | tar -C /opt/cni/bin -xz
CRICTL_VERSION="v1.11.1"
mkdir -p /opt/bin
curl -L "https://github.com/kubernetes-incubator/cri-tools/releases/download/${CRICTL_VERSION}/crictl-${CRICTL_VERSION}-linux-amd64.tar.gz" | tar -C /opt/bin -xz
RELEASE="$(curl -sSL https://dl.k8s.io/release/stable.txt)"
mkdir -p /opt/bin
cd /opt/bin
curl -L --remote-name-all https://storage.googleapis.com/kubernetes-release/release/${RELEASE}/bin/linux/amd64/{kubeadm,kubelet,kubectl}
chmod +x {kubeadm,kubelet,kubectl}
curl -sSL "https://raw.githubusercontent.com/kubernetes/kubernetes/${RELEASE}/build/debs/kubelet.service" | sed "s:/usr/bin:/opt/bin:g" > /etc/systemd/system/kubelet.service
mkdir -p /etc/systemd/system/kubelet.service.d
curl -sSL "https://raw.githubusercontent.com/kubernetes/kubernetes/${RELEASE}/build/debs/10-kubeadm.conf" | sed "s:/usr/bin:/opt/bin:g" > /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
then I remounted the SLATE data from the NVMe and symlinked the directory back into place and fix the hostname:
mkdir /run/slate
mount /dev/nvme0n1 /run/slate
cd /var/lib/
ln -s /run/slate/docker .
ln -s /run/slate/kubelet .
cd /etc
ln -s /run/slate/kubernetes .
hostnamectl set-hostname sl-um-es1.slateci.io
(if we just want the Kubernetes config and not the containers themselves, we could also put this on the SD cards)
then I restart kubelet and docker:
root@sl-um-es1 etc # systemctl start docker
[ 540.232767] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[ 540.248479] Bridge firewalling registered
[ 540.262573] nf_conntrack version 0.5.0 (65536 buckets, 262144 max)
[ 540.386557] Initializing XFRM netlink socket
[ 540.396107] Netfilter messages via NETLINK v0.30.
[ 540.404296] ctnetlink v0.93: registering with nfnetlink.
[ 540.500561] IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready
root@sl-um-es1 etc # [ 541.131548] DMAR: Allocating domain for dcdbas failed
[ 541.272305] DMAR: Allocating domain for dcdbas failed
[ 541.311994] DMAR: Allocating domain for dcdbas failed
[ 542.366708] DMAR: Allocating domain for dcdbas failed
[ 542.613241] DMAR: Allocating domain for dcdbas failed
[ 542.655627] DMAR: Allocating domain for dcdbas failed
[ 542.798551] DMAR: Allocating domain for dcdbas failed
root@sl-um-es1 etc #
root@sl-um-es1 etc # systemctl start kubelet
[ 547.834930] Fusion MPT base driver 3.04.20
[ 547.839036] Copyright (c) 1999-2008 LSI Corporation
[ 547.846728] Fusion MPT misc device (ioctl) driver 3.04.20
[ 547.852179] mptctl: Registered with Fusion MPT base driver
[ 547.857672] mptctl: /dev/mptctl @ (major,minor=10,220)
[ 547.896689] mpt3sas version 15.100.00.00 loaded
it should take a few minutes but it will recover:
[root@sl-um-oob1 export]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
sl-um-es1.slateci.io NotReady <none> 20d v1.11.3
sl-um-oob1.slateci.io Ready master 42d v1.11.2
[root@sl-um-oob1 export]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
sl-um-es1.slateci.io NotReady <none> 20d v1.12.0
sl-um-oob1.slateci.io Ready master 42d v1.11.2
[root@sl-um-oob1 export]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
sl-um-es1.slateci.io NotReady <none> 20d v1.12.0
sl-um-oob1.slateci.io Ready master 42d v1.11.2
[root@sl-um-oob1 export]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
sl-um-es1.slateci.io NotReady <none> 20d v1.12.0
sl-um-oob1.slateci.io Ready master 42d v1.11.2
[root@sl-um-oob1 export]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
sl-um-es1.slateci.io Ready <none> 20d v1.12.0
sl-um-oob1.slateci.io Ready master 42d v1.11.2