Last active
June 12, 2023 12:23
-
-
Save issacg/481911f41b6af9eb44590bb5fe1186f3 to your computer and use it in GitHub Desktop.
runai-logs
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[LOG] initializing Kubernetes client... | |
[LOG] successfully initialized Kubernetes client | |
cleaning up previous deployment if it exists... | |
waiting for all resources to be deleted... | |
all resources were successfully deleted | |
deploying runai diagnostics tool... | |
[34m[TEST] running external cluster tests...[0m | |
-------------------------------------------------- | |
[34m[TEST] GPU Nodes[0m | |
-------------------------------------------------- | |
[LOG] please verify that the list above includes all GPU nodes in the cluster | |
[LOG] if you suspect GPU nodes are missing from the list above, gpu-feature-discovery might be malfunctioning | |
[32m[PASS][0m | |
[34m[TEST] Nvidia device plugin[0m | |
-------------------------------------------------- | |
[32m[PASS][0m | |
[34m[TEST] DCGM Exporter[0m | |
-------------------------------------------------- | |
[32m[PASS][0m | |
[34m[TEST] Nginx Ingress Controller[0m | |
-------------------------------------------------- | |
[33m[WARNING] nginx ingress controller is installed in the cluster[0m | |
[33m[WARNING][0m | |
[34m[TEST] Cluster Version[0m | |
-------------------------------------------------- | |
[LOG] Kubernetes Cluster Version: v1.25.8-gke.1000 | |
[32m[PASS][0m | |
[34m[TEST] Storage Classes[0m | |
-------------------------------------------------- | |
[LOG] StorageClasses in cluster: | |
[LOG] premium-rwo | |
[LOG] standard | |
[LOG] standard-rwo | |
[32m[PASS][0m | |
[34m[TEST] Prometheus check[0m | |
-------------------------------------------------- | |
[33m[WARNING] prometheus is installed in the cluster[0m | |
[33m[WARNING][0m | |
[34m[TEST] Node Feature Discovery[0m | |
-------------------------------------------------- | |
[33m[WARNING] node-feature-discovery is installed in the cluster[0m | |
[33m[WARNING][0m | |
[34m[TEST] GPU Feature Discovery[0m | |
-------------------------------------------------- | |
[32m[PASS][0m | |
[34m[TEST] List Pods[0m | |
-------------------------------------------------- | |
[LOG] Namespace/Name/Phase | |
[LOG] cert-manager/cert-manager-544cd78564-khrft/Running | |
[LOG] cert-manager/cert-manager-cainjector-676b44b449-8c5pp/Running | |
[LOG] cert-manager/cert-manager-webhook-5c64c6c6f9-c9brc/Running | |
[LOG] default/tensorflow-benchmarks-launcher-w2hk2/Succeeded | |
[LOG] gpu-operator/gpu-operator-1686569784-node-feature-discovery-master-76b4z5gq6/Running | |
[LOG] gpu-operator/gpu-operator-1686569784-node-feature-discovery-worker-8kwz9/Running | |
[LOG] gpu-operator/gpu-operator-1686569784-node-feature-discovery-worker-mftp9/Running | |
[LOG] gpu-operator/gpu-operator-6495fb4657-kpggj/Running | |
[LOG] kube-system/event-exporter-gke-755c4b4d97-rnbfb/Running | |
[LOG] kube-system/fluentbit-gke-2lzz6/Running | |
[LOG] kube-system/fluentbit-gke-74krn/Running | |
[LOG] kube-system/fluentbit-gke-7zg6s/Running | |
[LOG] kube-system/fluentbit-gke-9fszk/Running | |
[LOG] kube-system/fluentbit-gke-q89lw/Running | |
[LOG] kube-system/gke-metadata-server-44v9w/Running | |
[LOG] kube-system/gke-metadata-server-5sb6x/Running | |
[LOG] kube-system/gke-metadata-server-kxz4q/Running | |
[LOG] kube-system/gke-metadata-server-vcrlh/Running | |
[LOG] kube-system/gke-metadata-server-zx6x6/Running | |
[LOG] kube-system/gke-metrics-agent-2d2gz/Running | |
[LOG] kube-system/gke-metrics-agent-4lv8t/Running | |
[LOG] kube-system/gke-metrics-agent-mpcwf/Running | |
[LOG] kube-system/gke-metrics-agent-r2vhh/Running | |
[LOG] kube-system/gke-metrics-agent-zh649/Running | |
[LOG] kube-system/kube-dns-5b5dfcd97b-bxb8v/Running | |
[LOG] kube-system/kube-dns-5b5dfcd97b-lbhmv/Running | |
[LOG] kube-system/kube-dns-autoscaler-5f56f8997c-lcwpw/Running | |
[LOG] kube-system/kube-proxy-gke-runai-mvp-runai-gpu-pool-a3994fbf-5xvz/Running | |
[LOG] kube-system/kube-proxy-gke-runai-mvp-runai-gpu-pool-cbec8e56-fzdb/Running | |
[LOG] kube-system/kube-proxy-gke-runai-mvp-runai-pool-4a223292-5mql/Running | |
[LOG] kube-system/kube-proxy-gke-runai-mvp-runai-pool-a6becd78-j7qs/Running | |
[LOG] kube-system/kube-proxy-gke-runai-mvp-system-b41ae89e-9ibp/Running | |
[LOG] kube-system/l7-default-backend-8cdcff48c-6fj8s/Running | |
[LOG] kube-system/metrics-server-v0.5.2-855ff55569-w8p6z/Running | |
[LOG] kube-system/netd-26hrz/Running | |
[LOG] kube-system/netd-bjr2g/Running | |
[LOG] kube-system/netd-jkcrc/Running | |
[LOG] kube-system/netd-vdvcq/Running | |
[LOG] kube-system/netd-zmx6v/Running | |
[LOG] kube-system/nvidia-driver-installer-2phgd/Running | |
[LOG] kube-system/nvidia-driver-installer-qfxfd/Running | |
[LOG] kube-system/nvidia-gpu-device-plugin-medium-d79kl/Running | |
[LOG] kube-system/nvidia-gpu-device-plugin-medium-fj8fv/Running | |
[LOG] kube-system/pdcsi-node-2dnb2/Running | |
[LOG] kube-system/pdcsi-node-9ddlb/Running | |
[LOG] kube-system/pdcsi-node-qnt6w/Running | |
[LOG] kube-system/pdcsi-node-s654d/Running | |
[LOG] kube-system/pdcsi-node-vpkxh/Running | |
[LOG] monitoring/alertmanager-prometheus-kube-prometheus-alertmanager-0/Running | |
[LOG] monitoring/prometheus-kube-prometheus-operator-6ddd77f99b-db29x/Running | |
[LOG] monitoring/prometheus-kube-state-metrics-7b7455ff5d-h8gzr/Running | |
[LOG] monitoring/prometheus-prometheus-kube-prometheus-prometheus-0/Running | |
[LOG] monitoring/prometheus-prometheus-node-exporter-6ml69/Running | |
[LOG] monitoring/prometheus-prometheus-node-exporter-9pmb5/Running | |
[LOG] monitoring/prometheus-prometheus-node-exporter-l8dkz/Running | |
[LOG] monitoring/prometheus-prometheus-node-exporter-mzvwq/Running | |
[LOG] monitoring/prometheus-prometheus-node-exporter-qc65k/Running | |
[LOG] mpi-operator/mpi-operator-76fbc4d578-7jfmv/Running | |
[LOG] nginx-ingress/nginx-ingress-ingress-nginx-controller-867dc6b6c5-4h2nz/Running | |
[LOG] runai-preinstall-diagnostics/runai-preinstall-diagnostics-bc48w/Pending | |
[LOG] runai-preinstall-diagnostics/runai-preinstall-diagnostics-qgss6/Pending | |
[32m[PASS][0m | |
[34m[TEST] running internal cluster tests using image gcr.io/run-ai-lab/preinstall-diagnostics:v2.4.0...[0m | |
-------------------------------------------------- | |
[LOG] not all pods are ready [0/2], retrying in 5 seconds | |
[LOG] not all pods are ready [0/2], retrying in 5 seconds | |
[LOG] not all pods are ready [0/2], retrying in 5 seconds | |
[LOG] all daemonset pods are available | |
[LOG] logs for [gke-runai-mvp-runai-pool-4a223292-5mql/runai-preinstall-diagnostics-bc48w] are not ready yet, retrying in 5 seconds... | |
[LOG] logs for [gke-runai-mvp-runai-pool-4a223292-5mql/runai-preinstall-diagnostics-bc48w] are not ready yet, retrying in 5 seconds... | |
========================== LOGS FROM NODE gke-runai-mvp-runai-pool-4a223292-5mql ========================== | |
Logs for [gke-runai-mvp-runai-pool-4a223292-5mql/runai-preinstall-diagnostics-bc48w]: | |
[LOG] initializing Kubernetes client... | |
[LOG] successfully initialized Kubernetes client | |
[34m[TEST] Run:AI service access: https://app.run.ai[0m | |
-------------------------------------------------- | |
[LOG] Run:AI service is accessible from within the cluster | |
[32m[PASS][0m | |
[34m[TEST] DNS Servers access[0m | |
-------------------------------------------------- | |
[LOG] Address for [app.run.ai] is [104.21.95.156], resolved by [1.1.1.1:53] | |
[LOG] Address for [app.run.ai] is [172.67.145.177], resolved by [8.8.8.8:53] | |
[32m[PASS][0m | |
[34m[TEST] Dynu DNS service access: https://api.dynu.com[0m | |
-------------------------------------------------- | |
[LOG] Dynu DNS service is accessible from within the cluster | |
[32m[PASS][0m | |
[34m[TEST] Connectivity to runai container registry: https://gcr.io/run-ai-prod[0m | |
-------------------------------------------------- | |
[LOG] Run:AI container registry is accessible | |
[32m[PASS][0m | |
[34m[TEST] DNS Resolver[0m | |
-------------------------------------------------- | |
[33m[WARNING] Backend FQDN was not provided using the --domain flag, skipping test[0m | |
[33m[SKIP][0m | |
[32m[PASS][0m | |
[34m[TEST] Print resolv.conf[0m | |
-------------------------------------------------- | |
[LOG] search runai-preinstall-diagnostics.svc.cluster.local svc.cluster.local cluster.local me-west1-b.c.runai-prod.internal c.runai-prod.internal google.internal | |
nameserver 10.0.32.10 | |
options ndots:5 | |
[32m[PASS][0m | |
[34m[TEST] Run:AI Helm Repository[0m | |
-------------------------------------------------- | |
[32m[PASS][0m | |
[34m[TEST] DockerHub[0m | |
-------------------------------------------------- | |
[32m[PASS][0m | |
[34m[TEST] Quay.io[0m | |
-------------------------------------------------- | |
[32m[PASS][0m | |
[34m[TEST] Run:AI Prometheus[0m | |
-------------------------------------------------- | |
[32m[PASS][0m | |
[34m[TEST] Run:AI Auth Provider[0m | |
-------------------------------------------------- | |
[32m[PASS][0m | |
[34m[TEST] OS Information[0m | |
-------------------------------------------------- | |
[LOG] Os Info: Linux runai-preinstall-diagnostics-bc48w 5.15.89+ #1 SMP Sat Mar 18 09:27:02 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | |
[32m[PASS][0m | |
[34m[TEST] Node connectivity check[0m | |
-------------------------------------------------- | |
[LOG] not all pods are ready [0/2], retrying in 5 seconds | |
[LOG] not all pods are ready [0/2], retrying in 5 seconds | |
[LOG] all daemonset pods are available | |
[LOG] attempting to ping pod [gke-runai-mvp-runai-pool-4a223292-5mql/runai-preinstall-diagnostics-bc48w]... | |
[LOG] [gke-runai-mvp-runai-pool-4a223292-5mql/runai-preinstall-diagnostics-bc48w] -> [gke-runai-mvp-runai-pool-4a223292-5mql/runai-preinstall-diagnostics-bc48w]: successfully pinged | |
[LOG] [gke-runai-mvp-runai-pool-4a223292-5mql/runai-preinstall-diagnostics-bc48w] -> [gke-runai-mvp-runai-pool-4a223292-5mql/runai-preinstall-diagnostics-bc48w]: node clocks are in sync | |
[LOG] attempting to ping pod [gke-runai-mvp-runai-pool-a6becd78-j7qs/runai-preinstall-diagnostics-qgss6]... | |
[LOG] [gke-runai-mvp-runai-pool-4a223292-5mql/runai-preinstall-diagnostics-bc48w] -> [gke-runai-mvp-runai-pool-a6becd78-j7qs/runai-preinstall-diagnostics-qgss6]: successfully pinged | |
[LOG] [gke-runai-mvp-runai-pool-4a223292-5mql/runai-preinstall-diagnostics-bc48w] -> [gke-runai-mvp-runai-pool-a6becd78-j7qs/runai-preinstall-diagnostics-qgss6]: node clocks are in sync | |
[32m[PASS][0m | |
[32m[COMPLETE][0m | |
========================== LOGS FROM NODE gke-runai-mvp-runai-pool-a6becd78-j7qs ========================== | |
Logs for [gke-runai-mvp-runai-pool-a6becd78-j7qs/runai-preinstall-diagnostics-qgss6]: | |
[LOG] initializing Kubernetes client... | |
[LOG] successfully initialized Kubernetes client | |
[34m[TEST] Run:AI service access: https://app.run.ai[0m | |
-------------------------------------------------- | |
[LOG] Run:AI service is accessible from within the cluster | |
[32m[PASS][0m | |
[34m[TEST] DNS Servers access[0m | |
-------------------------------------------------- | |
[LOG] Address for [app.run.ai] is [172.67.145.177], resolved by [1.1.1.1:53] | |
[LOG] Address for [app.run.ai] is [104.21.95.156], resolved by [8.8.8.8:53] | |
[32m[PASS][0m | |
[34m[TEST] Dynu DNS service access: https://api.dynu.com[0m | |
-------------------------------------------------- | |
[LOG] Dynu DNS service is accessible from within the cluster | |
[32m[PASS][0m | |
[34m[TEST] Connectivity to runai container registry: https://gcr.io/run-ai-prod[0m | |
-------------------------------------------------- | |
[LOG] Run:AI container registry is accessible | |
[32m[PASS][0m | |
[34m[TEST] DNS Resolver[0m | |
-------------------------------------------------- | |
[33m[WARNING] Backend FQDN was not provided using the --domain flag, skipping test[0m | |
[33m[SKIP][0m | |
[32m[PASS][0m | |
[34m[TEST] Print resolv.conf[0m | |
-------------------------------------------------- | |
[LOG] search runai-preinstall-diagnostics.svc.cluster.local svc.cluster.local cluster.local me-west1-c.c.runai-prod.internal c.runai-prod.internal google.internal | |
nameserver 10.0.32.10 | |
options ndots:5 | |
[32m[PASS][0m | |
[34m[TEST] Run:AI Helm Repository[0m | |
-------------------------------------------------- | |
[32m[PASS][0m | |
[34m[TEST] DockerHub[0m | |
-------------------------------------------------- | |
[32m[PASS][0m | |
[34m[TEST] Quay.io[0m | |
-------------------------------------------------- | |
[32m[PASS][0m | |
[34m[TEST] Run:AI Prometheus[0m | |
-------------------------------------------------- | |
[32m[PASS][0m | |
[34m[TEST] Run:AI Auth Provider[0m | |
-------------------------------------------------- | |
[32m[PASS][0m | |
[34m[TEST] OS Information[0m | |
-------------------------------------------------- | |
[LOG] Os Info: Linux runai-preinstall-diagnostics-qgss6 5.15.89+ #1 SMP Sat Mar 18 09:27:02 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | |
[32m[PASS][0m | |
[34m[TEST] Node connectivity check[0m | |
-------------------------------------------------- | |
[LOG] not all pods are ready [0/2], retrying in 5 seconds | |
[LOG] not all pods are ready [0/2], retrying in 5 seconds | |
[LOG] all daemonset pods are available | |
[LOG] attempting to ping pod [gke-runai-mvp-runai-pool-4a223292-5mql/runai-preinstall-diagnostics-bc48w]... | |
[LOG] [gke-runai-mvp-runai-pool-a6becd78-j7qs/runai-preinstall-diagnostics-qgss6] -> [gke-runai-mvp-runai-pool-4a223292-5mql/runai-preinstall-diagnostics-bc48w]: successfully pinged | |
[LOG] [gke-runai-mvp-runai-pool-a6becd78-j7qs/runai-preinstall-diagnostics-qgss6] -> [gke-runai-mvp-runai-pool-4a223292-5mql/runai-preinstall-diagnostics-bc48w]: node clocks are in sync | |
[LOG] attempting to ping pod [gke-runai-mvp-runai-pool-a6becd78-j7qs/runai-preinstall-diagnostics-qgss6]... | |
[LOG] [gke-runai-mvp-runai-pool-a6becd78-j7qs/runai-preinstall-diagnostics-qgss6] -> [gke-runai-mvp-runai-pool-a6becd78-j7qs/runai-preinstall-diagnostics-qgss6]: successfully pinged | |
[LOG] [gke-runai-mvp-runai-pool-a6becd78-j7qs/runai-preinstall-diagnostics-qgss6] -> [gke-runai-mvp-runai-pool-a6becd78-j7qs/runai-preinstall-diagnostics-qgss6]: node clocks are in sync | |
[32m[PASS][0m | |
[32m[COMPLETE][0m | |
cleaning up... | |
waiting for all resources to be deleted... | |
all resources were successfully deleted | |
[33m[WARNING] Cluster setup includes components that will require the customization of Run:AI installation. For more details, see installation instructions[0m |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment