LincolnBryant/slateboot.md

## slateboot.md

      
    Raw
  

              slateboot.md
            
          
    racadm>>racadm remoteimage -c -l 10.10.2.242:/export/slate3.img 

racadm remoteimage -c -l 10.10.2.242:/export/slate3.img  
Remote Image is now Configured

racadm>>racadm serveraction powercycle 

racadm serveraction powercycle  
Server power operation initiated successfully
/admin1-> console com2


Connected to Serial Device 2. To end type: ^\


now it is mounted, and BIOS sees it:

Booting from Virtual Floppy Drive
iPXE initialising devices...ok


iPXE 1.0.0+ (af18) -- Open Source Network Boot Firmware -- http://ipxe.org
Features: DNS HTTP iSCSI TFTP SRP VLAN AoE EFI Menu
SLATE NetBoot v0.30

I hit 'm' to go to the iPXE SHell. Otherwise it will try to boot with DHCP
then you'll see:

                          SLATE NetBoot Failsafe Menu
      
   Boot to local drive                                                         
   Manual network configuration                                                
   Retry boot                                                                  
   iPXE Debug Shell                                                            
   Reboot System                                                               

I go to 'manual network configuration' and plug in values as appropriate:
Network Configuration:
Available interfaces...
net0: 24:6e:96:c6:43:94 using i350 on 0000:01:00.0 (closed)
  [Link:up, TX:0 TXE:0 RX:0 RXE:0]
net1: 24:6e:96:c6:43:95 using i350 on 0000:01:00.1 (closed)
  [Link:up, TX:0 TXE:0 RX:0 RXE:0]
net2: 24:6e:96:c6:43:90 using 82599-sfp on 0000:18:00.0 (closed)
  [Link:down, TX:0 TXE:0 RX:0 RXE:0]
  [Link status: Down (http://ipxe.org/38086193)]
net3: 24:6e:96:c6:43:92 using 82599-sfp on 0000:18:00.1 (closed)
  [Link:down, TX:0 TXE:0 RX:0 RXE:0]
  [Link status: Down (http://ipxe.org/38086193)]
Set network interface number [0 for net0, defaults to 0]: 1
IP:192.41.231.235
Subnet mask:255.255.254.0
Gateway:192.41.230.1
DNS:8.8.8.8

You will then see:
Attempting chainload of SLATE Image Repo...
Could not start download: Operation not supported (http://ipxe.org/3c092083)
HTTPS appears to have failed... attempting HTTP
http://192.170.227.197/~lincolnb/slate.ipxe... ok 

Which will take you to this menu:

                           SLATE Image Selection Menu
      
   PerfSONAR Installer                                                         
   SLATE CoreOS Edge Service                                                   
   SLATE Management Node (EL7)                                                 
   Configuration                                                               
   Enable serial console                                                       
   SPECIAL                                                                     
   SLATE Edge Service Node (UMich)                                                                                                                       

I go to "Enable serial console":
Configure console...
Console (default tty0):ttyS0
Baud rate (default none):115200

Then I go back and pick the "UMich" entry from the SPECIAL tab.
it chugged for a while becasue I didn't have the nginx server started on sl-um-oob1:
[**    ] A start job is running for Ignition (disks) (11s / no limit)[   18.084952] ignition[593]: GET http://10.10.2.242/slate.ign: attempt #7
[   18.093198] ignition[593]: GET error: Get http://10.10.2.242/slate.ign: dial tcp 10.10.2.242:80: connect: connection refused
[ ***  ] A start job is running for Ignition (disks) (12s / no limit)[   19.283867] systemd-networkd[549]: eth2: DHCPv4 address 192.41.231.235/23 via 192.41.230.1
[   19.293217] systemd-networkd[549]: Not connected to system bus, not setting hostname.
[  *** ] A start job is running for Ignition (disks) (16s / no limit)[   23.085488] ignition[593]: GET http://10.10.2.242/slate.ign: attempt #8
[   23.094289] ignition[593]: GET error: Get http://10.10.2.242/slate.ign: dial tcp 10.10.2.242:80: connect: connection refused
[**    ] A start job is running for Ignition (disks) (18s / no limit)[   25.161800] systemd-networkd[549]: eth0: Configured
[   25.225418] systemd-networkd[549]: eth2: Configured
[  *** ] A start job is running for Ignition (disks) (21s / no limit)[   28.086164] ignition[593]: GET http://10.10.2.242/slate.ign: attempt #9
[   28.094179] ignition[593]: GET error: Get http://10.10.2.242/slate.ign: dial tcp 10.10.2.242:80: connect: connection refused
[**    ] A start job is running for Ignition (disks) (26s / no limit)[   33.086559] ignition[593]: GET http://10.10.2.242/slate.ign: attempt #10
[   33.095152] ignition[593]: GET error: Get http://10.10.2.242/slate.ign: dial tcp 10.10.2.242:80: connect: connection refused
[     *] A start job is running for Ignition (disks) (31s / no limit)[   38.087104] ignition[593]: GET http://10.10.2.242/slate.ign: attempt #11
[   38.095321] ignition[593]: GET error: Get http://10.10.2.242/slate.ign: dial tcp 10.10.2.242:80: connect: connection refused
[***   ] A start job is running for Ignition (disks) (36s / no limit)[   43.087648] ignition[593]: GET http://10.10.2.242/slate.ign: attempt #12
[   43.096227] ignition[593]: GET error: Get http://10.10.2.242/slate.ign: dial tcp 10.10.2.242:80: connect: connection refused
[ ***  ] A start job is running for Ignition (disks) (41s / no limit)[   48.088276] ignition[593]: GET http://10.10.2.242/slate.ign: attempt #13
[   48.096271] ignition[593]: GET error: Get http://10.10.2.242/slate.ign: dial tcp 10.10.2.242:80: connect: connection refused
[     *] A start job is running for Ignition (disks) (46s / no limit)[   53.089042] ignition[593]: GET http://10.10.2.242/slate.ign: attempt #14
[   53.097104] ignition[593]: GET result: OK

once I started nginx, it downlaoded the next stage of the boot process:
This is localhost (Linux x86_64 4.14.67-coreos) 20:16:08
SSH host key: SHA256:TtlvKUfFWuO2yhn9qvO2O/2VB8nV/ioAX/BFDlxomp0 (ED25519)
SSH host key: SHA256:5+DTuYpdcxO3LFE1cGpkQW/64uxrBDZ1RBw7+d2uH10 (ECDSA)
SSH host key: SHA256:ASVq1/Ykco+VUPykbA7hf8elhG43wK1gDa7T9zcy9B0 (DSA)
SSH host key: SHA256:hcqDAd9zm8ps3o0y4lFA1Am/UXiERF3C0SXNsrcXU6I (RSA)
eno1:  
eno2:  
eno3:  
eno4:  
idrac: 169.254.1.2 fe80::d294:66ff:fe5f:f278

localhost login: core (automatic login)

Last login: Thu Oct  4 20:16:08 UTC 2018 on tty1
Container Linux by CoreOS stable (1855.4.0)
Update Strategy: No Reboots
core@localhost ~ $ [   62.503315] igb 0000:01:00.0 eno3: igb: eno3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[   62.512519] IPv6: ADDRCONF(NETDEV_CHANGE): eno3: link becomes ready
[   62.764311] igb 0000:01:00.1 eno4: igb: eno4 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[   62.773787] IPv6: ADDRCONF(NETDEV_CHANGE): eno4: link becomes ready

core@localhost ~ $ 
core@localhost ~ $ 

it automatically dhcp'd on eno3 as well (10.x) but I turned that interface off just incase:
root@sl-um-es1 network # ip link set dev eno3 down

next thing to do is follow the items from this guide:
https://kubernetes.io/docs/setup/independent/install-kubeadm/

Specifically:
CNI_VERSION="v0.6.0"
mkdir -p /opt/cni/bin
curl -L "https://github.com/containernetworking/plugins/releases/download/${CNI_VERSION}/cni-plugins-amd64-${CNI_VERSION}.tgz" | tar -C /opt/cni/bin -xz
CRICTL_VERSION="v1.11.1"
mkdir -p /opt/bin
curl -L "https://github.com/kubernetes-incubator/cri-tools/releases/download/${CRICTL_VERSION}/crictl-${CRICTL_VERSION}-linux-amd64.tar.gz" | tar -C /opt/bin -xz
RELEASE="$(curl -sSL https://dl.k8s.io/release/stable.txt)"
mkdir -p /opt/bin
cd /opt/bin
curl -L --remote-name-all https://storage.googleapis.com/kubernetes-release/release/${RELEASE}/bin/linux/amd64/{kubeadm,kubelet,kubectl}
chmod +x {kubeadm,kubelet,kubectl}
curl -sSL "https://raw.githubusercontent.com/kubernetes/kubernetes/${RELEASE}/build/debs/kubelet.service" | sed "s:/usr/bin:/opt/bin:g" > /etc/systemd/system/kubelet.service
mkdir -p /etc/systemd/system/kubelet.service.d
curl -sSL "https://raw.githubusercontent.com/kubernetes/kubernetes/${RELEASE}/build/debs/10-kubeadm.conf" | sed "s:/usr/bin:/opt/bin:g" > /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

then I remounted the SLATE data from the NVMe and symlinked the directory back into place and fix the hostname:
mkdir /run/slate
mount /dev/nvme0n1 /run/slate
cd /var/lib/
ln -s /run/slate/docker .
ln -s /run/slate/kubelet .
cd /etc
ln -s /run/slate/kubernetes .
hostnamectl set-hostname sl-um-es1.slateci.io

(if we just want the Kubernetes config and not the containers themselves, we could also put this on the SD cards)
then I restart kubelet and docker:
root@sl-um-es1 etc # systemctl start docker
[  540.232767] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[  540.248479] Bridge firewalling registered
[  540.262573] nf_conntrack version 0.5.0 (65536 buckets, 262144 max)
[  540.386557] Initializing XFRM netlink socket
[  540.396107] Netfilter messages via NETLINK v0.30.
[  540.404296] ctnetlink v0.93: registering with nfnetlink.
[  540.500561] IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready
root@sl-um-es1 etc # [  541.131548] DMAR: Allocating domain for dcdbas failed
[  541.272305] DMAR: Allocating domain for dcdbas failed
[  541.311994] DMAR: Allocating domain for dcdbas failed
[  542.366708] DMAR: Allocating domain for dcdbas failed
[  542.613241] DMAR: Allocating domain for dcdbas failed
[  542.655627] DMAR: Allocating domain for dcdbas failed
[  542.798551] DMAR: Allocating domain for dcdbas failed

root@sl-um-es1 etc # 
root@sl-um-es1 etc # systemctl start kubelet
[  547.834930] Fusion MPT base driver 3.04.20
[  547.839036] Copyright (c) 1999-2008 LSI Corporation
[  547.846728] Fusion MPT misc device (ioctl) driver 3.04.20
[  547.852179] mptctl: Registered with Fusion MPT base driver
[  547.857672] mptctl: /dev/mptctl @ (major,minor=10,220)

[  547.896689] mpt3sas version 15.100.00.00 loaded

it should take a few minutes but it will recover:
[root@sl-um-oob1 export]# kubectl get nodes
NAME                    STATUS     ROLES     AGE       VERSION
sl-um-es1.slateci.io    NotReady   <none>    20d       v1.11.3
sl-um-oob1.slateci.io   Ready      master    42d       v1.11.2
[root@sl-um-oob1 export]# kubectl get nodes
NAME                    STATUS     ROLES     AGE       VERSION
sl-um-es1.slateci.io    NotReady   <none>    20d       v1.12.0
sl-um-oob1.slateci.io   Ready      master    42d       v1.11.2
[root@sl-um-oob1 export]# kubectl get nodes
NAME                    STATUS     ROLES     AGE       VERSION
sl-um-es1.slateci.io    NotReady   <none>    20d       v1.12.0
sl-um-oob1.slateci.io   Ready      master    42d       v1.11.2
[root@sl-um-oob1 export]# kubectl get nodes
NAME                    STATUS     ROLES     AGE       VERSION
sl-um-es1.slateci.io    NotReady   <none>    20d       v1.12.0
sl-um-oob1.slateci.io   Ready      master    42d       v1.11.2
[root@sl-um-oob1 export]# kubectl get nodes
NAME                    STATUS    ROLES     AGE       VERSION
sl-um-es1.slateci.io    Ready     <none>    20d       v1.12.0
sl-um-oob1.slateci.io   Ready     master    42d       v1.11.2