Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save sohailanjum97/a73922efe218be39b6d65b12b9c20f66 to your computer and use it in GitHub Desktop.
Save sohailanjum97/a73922efe218be39b6d65b12b9c20f66 to your computer and use it in GitHub Desktop.
tags
Kubernetes, Kubernetes-dashboard, docker, nvidia driver T4

Kubernetes Installtion with Docker Installation, NVIDIA T4 Driver and Kubernetes Dashboard Installation

tags: NVIDIA, Kubernetes, Docker, Kubernetes-dashboard

HW Equipment

Hardware System: SCB 1921B-AA1 OS: Ubuntu 18.04 LTS, kernel 5.4.0-42 GPU: NVIDIA GEFORCE GTX 1050

Docker Installtion

Install the prerequsities:

$ sudo apt-get update
$ sudo apt-get install \
   apt-transport-https \
   ca-certificates \
   curl \
   gnupg-agent \
   software-properties-common

Add gpg key:

$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ sudo apt-key fingerprint 0EBFCD88

Make sure result is like this:
9DC8 5822 9FC7 DD38 854A E2D8 8D81 803C 0EBF CD88

Add repository:

$ sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
$ sudo apt-get update

Docker Installtion:

$ sudo apt-get install docker-ce docker-ce-cli containerd.io

Check the Docker Version:

$ docker version

Now Installtion the NVIDIA Driver

download .run installing driver file from NVIDIA website:

Note: Select your model of Nvida and OS before Downlaoding: https://www.nvidia.com/Download/index.aspx?lang=en-us

before installtion blacklist nouveau driver

create a file:

$ sudo vim /etc/modprobe.d/blacklist-nouveau.conf

in blacklist-nouveau.conf

blacklist nouveau
options nouveau modeset=0

save the file and exit

final:

$ sudo update-initramfs -u
$ sudo reboot

After restarting, we can use the following command to confirm whether Nouveau has stopped working:

lsmod | grep nouveau

If nothing is printed, then congratulations! You have disabled Nouveau's kernel driver. Now we can try again to see if we can install Nvidia's official driver

make it excutable

$ chmod +x NVIDIA-Linux-x86_64-460.32.03.run //make it executable

install gcc and make

$ sudo apt-get install gcc make

installing nvidia driver

$ ./NVIDIA-Linux-x86_64-460.32.03.run //name of file may be different, depends on the version which you download from

in .run, there're some warnings, just choose continue installing item and finish the installing procedure

and

$ reboot

after reboot, press nvidia-smi to see the driver is OK or not

$ nvidia-smi

The output would be like this

Kubernetes Installtion

Do the SWAPOFF:

$ sudo su
$ swapoff –a

**Optional Step: $ nano /etc/fstab add # into the following line like this: #UUID=45fc9fe6-6500-4bca-864e-1effad4764b3 and save **

Install the prerequsities:

$ sudo apt-get update
$ sudo apt-get install -y apt-transport-https ca-certificates curl
$ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -

Adding the kubernetes repository into the update list:

cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb http://apt.kubernetes.io/ kubernetes-xenial main
EOF

update:

$ sudo apt-get update

install kubeadm kubectl kubelet:

$ apt-get install -y kubelet kubeadm kubectl

check the versions

$ docker -v
Docker version 17.03.2-ce, build f5ec1e2
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.1", GitCommit:"4ed3216f3ec431b140b1d899130a69fc671678f4", GitTreeState:"clean", BuildDate:"2018-10-05T16:46:06Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
The connection to the server localhost:8080 was refused - did you specify the right host or port?
$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.1", GitCommit:"4ed3216f3ec431b140b1d899130a69fc671678f4", GitTreeState:"clean", BuildDate:"2018-10-05T16:43:08Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
$ kubelet --version
Kubernetes v1.12.1

Set up the environment driver in config file:

$ gedit /etc/systemd/system/kubelet.service.d/10.kubeadm.conf

Start cluster:

$ kubeadm init --pod-network-cidr=10.244.0.0/16
*###This will take 3-4 mintues##*

Run the following commands as non-root user:

$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config

cluster information:

$ kubectl cluster-info
Kubernetes master is running at https://10.132.0.2:6443
KubeDNS is running at https://10.132.0.2:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

$ kubectl get no -o wide
NAME            STATUS     ROLES    AGE     VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION    CONTAINER-RUNTIME
kube-master-1   NotReady   master   4m26s   v1.12.1   10.132.0.2    <none>        Ubuntu 16.04.5 LTS   4.15.0-1021-gcp   docker://17.3.2

$ kubectl get all --all-namespaces 
NAMESPACE     NAME                                        READY   STATUS    RESTARTS   AGE
kube-system   pod/coredns-576cbf47c7-lw7jv                0/1     Pending   0          4m55s
kube-system   pod/coredns-576cbf47c7-ncx8w                0/1     Pending   0          4m55s
kube-system   pod/etcd-kube-master-1                      1/1     Running   0          4m23s
kube-system   pod/kube-apiserver-kube-master-1            1/1     Running   0          3m59s
kube-system   pod/kube-controller-manager-kube-master-1   1/1     Running   0          4m17s
kube-system   pod/kube-proxy-bwrwh                        1/1     Running   0          4m55s
kube-system   pod/kube-scheduler-kube-master-1            1/1     Running   0          4m10s

NAMESPACE     NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)         AGE
default       service/kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP         5m15s
kube-system   service/kube-dns     ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP   5m9s

NAMESPACE     NAME                        DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
kube-system   daemonset.apps/kube-proxy   1         1         1       1            1           <none>          5m8s

NAMESPACE     NAME                      DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
kube-system   deployment.apps/coredns   2         2         2            0           5m9s

NAMESPACE     NAME                                 DESIRED   CURRENT   READY   AGE
kube-system   replicaset.apps/coredns-576cbf47c7   2         2         0       4m56s

Install CNI (I prefer weave)::

$kubectl apply -f “https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d ‘\n’)”

clusterrole.rbac.authorization.k8s.io/weave-net created
clusterrolebinding.rbac.authorization.k8s.io/weave-net created
. 
.
.

Confirm with those commands:

$ kubectl get no -o wide
NAME            STATUS   ROLES    AGE     VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION    CONTAINER-RUNTIME
kube-master-1   Ready    master   9m15s   v1.12.1   10.132.0.2    <none>        Ubuntu 16.04.5 LTS   4.15.0-1021-gcp   docker://17.3.2

Deploy the Dashboard

kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0/aio/deploy/recommended.yaml

Now we have to creat the admin-user for access the dashboard

kubectl apply -f https://gist.githubusercontent.com/chukaofili/9e94d966e73566eba5abdca7ccb067e6/raw/0f17cd37d2932fb4c3a2e7f4434d08bc64432090/k8s-dashboard-admin-user.yaml

copy the key and used for login

sign in and see the GUI

## source: 
1. https://clay-atlas.com/blog/2020/02/11/linux-chinese-note-nvidia-driver-nouveau-kernel/
2. https://askubuntu.com/questions/841876/how-to-disable-nouveau-kernel-driver
3. https://docs.docker.com/engine/install/ubuntu/
4. https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
5. https://stackoverflow.com/questions/52720380/kubernetes-api-server-is-not-starting-on-a-single-kubeadm-cluster


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment