Skip to content

Instantly share code, notes, and snippets.

@sdeoras
Last active September 4, 2019 01:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sdeoras/42542c1911a93e8d696f9ddad94887e6 to your computer and use it in GitHub Desktop.
Save sdeoras/42542c1911a93e8d696f9ddad94887e6 to your computer and use it in GitHub Desktop.
Steps to repro gvisor sandbox pod communication error

create sandboxed cluster

please set PROJECT env. var to your Google Cloud project name.

gcloud beta container \
    --project "${PROJECT}" \
    clusters create "sandboxed" \
    --zone "us-west1-a" \
    --no-enable-basic-auth \
    --cluster-version "1.13.7-gke.24" \
    --machine-type "n1-standard-2" \
    --image-type "COS_CONTAINERD" \
    --disk-type "pd-standard" \
    --disk-size "25" \
    --metadata disable-legacy-endpoints=true \
    --sandbox type="gvisor" \
    --num-nodes "3" \
    --enable-cloud-logging \
    --enable-cloud-monitoring \
    --enable-ip-alias \
    --network "projects/${PROJECT}/global/networks/default" \
    --subnetwork "projects/${PROJECT}/regions/us-west1/subnetworks/default" \
    --default-max-pods-per-node "110" \
    --addons HorizontalPodAutoscaling,HttpLoadBalancing \
    --enable-autoupgrade \
    --enable-autorepair \
    --scopes "https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append"

connect to cluster

$ gcloud container clusters get-credentials sandboxed --zone us-west1-a --project ${PROJECT}

check runtime on nodes

All nodes in this configuration are from gvisor sandboxed nodepool.

$ kubectl get nodes -o wide
NAME                                 STATUS   ROLES    AGE     VERSION          INTERNAL-IP   EXTERNAL-IP      OS-IMAGE                             KERNEL-VERSION   CONTAINER-RUNTIME
gke-sandboxed-gvisor-7468a574-7cp5   Ready    <none>   7m26s   v1.13.7-gke.24   10.138.0.63   35.247.102.177   Container-Optimized OS from Google   4.14.137+        containerd://1.2.7
gke-sandboxed-gvisor-7468a574-7gn0   Ready    <none>   7m27s   v1.13.7-gke.24   10.138.0.62   35.185.225.221   Container-Optimized OS from Google   4.14.137+        containerd://1.2.7
gke-sandboxed-gvisor-7468a574-szt9   Ready    <none>   7m26s   v1.13.7-gke.24   10.138.0.61   35.227.162.211   Container-Optimized OS from Google   4.14.137+        containerd://1.2.7

check runtimeclass

$ kubectl get runtimeclass
NAME     AGE
gvisor   15m

test pod to pod communication

Here is a simple server/client pod configuration which tests pod to pod communication by sending and receiving ping-pong messages. Client sends "ping" (a string) to server and the server responds by "pong". Client does this n number of times and exits.

start server

Spin up a gRPC based server as a deployment and expose it on ClusterIP

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ping-pong-gvisor-server-deployment
spec:
  selector:
    matchLabels:
      app: ping-pong-gvisor-server
  template:
    metadata:
      name: ping-pong-gvisor-server
      labels:
        app: ping-pong-gvisor-server
        name: ping-pong-gvisor
    spec:
      runtimeClassName: gvisor
      containers:
      - name: ping-pong-gvisor-server
        image: docker.io/sdeoras/ping-pong-server:85e47e08-clean
        command:
          - /ping-pong-server
        args:
          - run
          - --host
          - 0.0.0.0
          - --port
          - "5001"
        resources:
          requests:
            memory: "32Mi"
            cpu: "100m"
          limits:
            memory: "128Mi"
            cpu: "500m"
        ports:
        - containerPort: 5001
---
apiVersion: v1
kind: Service
metadata:
  name: ping-pong-gvisor-server-service
spec:
  type: ClusterIP
  selector:
    app: ping-pong-gvisor-server
  ports:
  - port: 5001
    targetPort: 5001

check server pod and service

$ kubectl get pods -o wide
NAME                                                 READY   STATUS    RESTARTS   AGE   IP          NODE                                 NOMINATED NODE   READINESS GATES
ping-pong-gvisor-server-deployment-6dd7c668d-lhj7m   1/1     Running   0          9s    10.12.2.2   gke-sandboxed-gvisor-7468a574-7cp5   <none>           <none>
$ kubectl get svc -o wide
NAME                              TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)    AGE   SELECTOR
kubernetes                        ClusterIP   10.16.0.1     <none>        443/TCP    22m   <none>
ping-pong-gvisor-server-service   ClusterIP   10.16.4.116   <none>        5001/TCP   45s   app=ping-pong-gvisor-server

start client

Spin up the client as a kubernetes job.

apiVersion: batch/v1
kind: Job
metadata:
  name: ping-pong-gvisor-job
spec:
  template:
    metadata:
      name: ping-pong-gvisor-client
      labels:
        name: ping-pong-gvisor
    spec:
      runtimeClassName: gvisor
      affinity:
        podAntiAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              - labelSelector:
                  matchExpressions:
                    - key: name
                      operator: In
                      values:
                        - ping-pong-gvisor
                topologyKey: kubernetes.io/hostname
      restartPolicy: OnFailure
      containers:
      - name: ping-pong-gvisor-client
        image: docker.io/sdeoras/ping-pong-client:85e47e08-clean
        imagePullPolicy: IfNotPresent
        command:
          - /ping-pong-client
        args:
          - run
          - --host
          - ping-pong-gvisor-server-service
          - --port
          - "5001"
        resources:
          limits:
            memory: "128Mi"
            cpu: "500m"

check client pod

$ kubectl get pods -o wide
NAME                                                 READY   STATUS    RESTARTS   AGE     IP          NODE                                 NOMINATED NODE   READINESS GATES
ping-pong-gvisor-job-4vm8h                           1/1     Running   0          15s     10.12.1.2   gke-sandboxed-gvisor-7468a574-7gn0   <none>           <none>
ping-pong-gvisor-server-deployment-6dd7c668d-lhj7m   1/1     Running   0          4m44s   10.12.2.2   gke-sandboxed-gvisor-7468a574-7cp5   <none>           <none>

get client logs

$ kubectl logs pods/ping-pong-gvisor-job-4vm8h
Error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"
Usage:
  pingpong-client run [flags]

Flags:
  -h, --help             help for run
      --host string      hostname
      --iterations int   number of ping pong iterations (default 100)
      --port string      port number (default "5001")

Global Flags:
      --config string   config file (default is $HOME/.pingpong-client.yaml)

rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment