Skip to content

Instantly share code, notes, and snippets.

@misskecupbung
Last active November 18, 2023 08:41
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save misskecupbung/1f345cf3f430db95b9bda61a1ce81bd5 to your computer and use it in GitHub Desktop.
Save misskecupbung/1f345cf3f430db95b9bda61a1ce81bd5 to your computer and use it in GitHub Desktop.

Using Anthos Service Mesh on GKE clusters

Source: https://cloud.google.com/service-mesh/docs/security/egress-gateway-gke-tutorial

Setting up the infrastructure

Initial Configurations

  1. Define variables
WORKING_DIRECTORY=<WORKING_DIRECTORY>
PROJECT_ID=<PROJECT_ID>
REGION=<REGION>
ZONE=<ZONE>
  1. Configure region and zone
gcloud config set project ${PROJECT_ID}
gcloud config set compute/region ${REGION}
gcloud config set compute/zone ${ZONE}
  1. Enable compute.googleapis.com
gcloud services enable compute.googleapis.com --project=${PROJECT_ID}

Create a VPC network and subnet

  1. Create a new VPC network
gcloud compute networks create vpc-network \
    --subnet-mode custom
  1. Create a subnet for the cluster to run in with pre-assigned secondary IP address ranges for Pods and services.
gcloud compute networks subnets create subnet-gke \
    --network vpc-network \
    --range 10.0.0.0/24 \
    --secondary-range pods=10.1.0.0/16,services=10.2.0.0/20 \
    --enable-private-ip-google-access

Configure Cloud NAT

  1. Create a Cloud Router:
gcloud compute routers create nat-router \
    --network vpc-network
  1. Add a NAT configuration to the router:
gcloud compute routers nats create nat-config \
    --router nat-router \
    --nat-all-subnet-ip-ranges \
    --auto-allocate-nat-external-ips

Create service accounts for each GKE node pool

  1. Create a service account for use by the nodes in the default node pool:
gcloud iam service-accounts create sa-application-nodes \
    --description="SA for application nodes" \
    --display-name="sa-application-nodes"
  1. Create a service account for use by the nodes in the gateway node pool:
gcloud iam service-accounts create sa-gateway-nodes \
    --description="SA for gateway nodes" \
    --display-name="sa-gateway-nodes"

Grant permissions to the service accounts

    project_roles=(
        roles/logging.logWriter
        roles/monitoring.metricWriter
        roles/monitoring.viewer
        roles/storage.objectViewer
    )
    for role in "${project_roles[@]}"
    do
        gcloud projects add-iam-policy-binding ${PROJECT_ID} \
            --member="serviceAccount:sa-application-nodes@${PROJECT_ID}.iam.gserviceaccount.com" \
            --role="$role"
        gcloud projects add-iam-policy-binding ${PROJECT_ID} \
            --member="serviceAccount:sa-gateway-nodes@${PROJECT_ID}.iam.gserviceaccount.com" \
            --role="$role"
    done

Creating the firewall rules

  1. Create a default (low priority) firewall rule to deny all egress from the VPC network:
gcloud compute firewall-rules create global-deny-egress-all \
    --action DENY \
    --direction EGRESS \
    --rules all \
    --destination-ranges 0.0.0.0/0 \
    --network vpc-network \
    --priority 65535 \
    --description "Default rule to deny all egress from the network."
  1. Create a rule to allow only those nodes with the gateway service account to reach the internet:
gcloud compute firewall-rules create gateway-allow-egress-web \
    --action ALLOW \
    --direction EGRESS \
    --rules tcp:80,tcp:443 \
    --target-service-accounts sa-gateway-nodes@${PROJECT_ID}.iam.gserviceaccount.com \
    --network vpc-network \
    --priority 1000 \
    --description "Allow the nodes running the egress gateways to connect to the web"
  1. Allow nodes to the reach the Kubernetes control plane:
gcloud compute firewall-rules create allow-egress-to-api-server \
    --action ALLOW \
    --direction EGRESS \
    --rules tcp:443,tcp:10250 \
    --target-service-accounts sa-application-nodes@${PROJECT_ID}.iam.gserviceaccount.com,sa-gateway-nodes@${PROJECT_ID}.iam.gserviceaccount.com \
    --destination-ranges 10.5.0.0/28 \
    --network vpc-network \
    --priority 1000 \
    --description "Allow nodes to reach the Kubernetes API server."
  1. Optional: This firewall rule is not needed if you use Managed Anthos Service Mesh.
gcloud compute firewall-rules create allow-ingress-api-server-to-webhook \
    --action ALLOW \
    --direction INGRESS \
    --rules tcp:15017 \
    --target-service-accounts sa-application-nodes@${PROJECT_ID}.iam.gserviceaccount.com,sa-gateway-nodes@${PROJECT_ID}.iam.gserviceaccount.com \
    --source-ranges 10.5.0.0/28 \
    --network vpc-network \
    --priority 1000 \
    --description "Allow the API server to call the webhooks exposed by istiod discovery"
  1. Allow egress connectivity between Nodes and Pods running on the cluster.
gcloud compute firewall-rules create allow-egress-nodes-and-pods \
    --action ALLOW \
    --direction EGRESS \
    --rules all \
    --target-service-accounts sa-application-nodes@${PROJECT_ID}.iam.gserviceaccount.com,sa-gateway-nodes@${PROJECT_ID}.iam.gserviceaccount.com \
    --destination-ranges 10.0.0.0/24,10.1.0.0/16 \
    --network vpc-network \
    --priority 1000 \
    --description "Allow egress to other Nodes and Pods"
  1. Allow access to the reserved sets of IP addresses used by Private Google Access for serving Google APIs, Container Registry, and other services:
gcloud compute firewall-rules create allow-egress-gcp-apis \
    --action ALLOW \
    --direction EGRESS \
    --rules tcp \
    --target-service-accounts sa-application-nodes@${PROJECT_ID}.iam.gserviceaccount.com,sa-gateway-nodes@${PROJECT_ID}.iam.gserviceaccount.com \
    --destination-ranges 199.36.153.8/30 \
    --network vpc-network \
    --priority 1000 \
    --description "Allow access to the VIPs used by Google Cloud APIs (Private Google Access)"
  1. Allow the Google Cloud health checker service to access pods running in the cluster.
gcloud compute firewall-rules create allow-ingress-gcp-health-checker \
    --action ALLOW \
    --direction INGRESS \
    --rules tcp:80,tcp:443 \
    --target-service-accounts sa-application-nodes@${PROJECT_ID}.iam.gserviceaccount.com,sa-gateway-nodes@${PROJECT_ID}.iam.gserviceaccount.com \
    --source-ranges 35.191.0.0/16,130.211.0.0/22,209.85.152.0/22,209.85.204.0/22 \
    --network vpc-network \
    --priority 1000 \
    --description "Allow workloads to respond to Google Cloud health checks"
  1. Allow the Google Cloud health checker service to access pods running in the cluster.
gcloud compute firewall-rules create allow-egress-gcp-apis \
    --action ALLOW \
    --direction EGRESS \
    --rules tcp \
    --target-service-accounts sa-application-nodes@${PROJECT_ID}.iam.gserviceaccount.com,sa-gateway-nodes@${PROJECT_ID}.iam.gserviceaccount.com \
    --destination-ranges 199.36.153.8/30 \
    --network vpc-network \
    --priority 1000 \
    --description "Allow access to the VIPs used by Google Cloud APIs (Private Google Access)"

Configuring private access to Google Cloud APIs

  1. Enable the Cloud DNS API
gcloud services enable dns.googleapis.com
  1. Create a private DNS zone, a CNAME, and A records so that nodes and workloads can connect to Google APIs and services using Private Google Access and the private.googleapis.com hostname
gcloud dns managed-zones create private-google-apis \
    --description "Private DNS zone for Google APIs" \
    --dns-name googleapis.com \
    --visibility private \
    --networks vpc-network

gcloud dns record-sets transaction start --zone private-google-apis

gcloud dns record-sets transaction add private.googleapis.com. \
    --name "*.googleapis.com" \
    --ttl 300 \
    --type CNAME \
    --zone private-google-apis

gcloud dns record-sets transaction add "199.36.153.8" \
"199.36.153.9" "199.36.153.10" "199.36.153.11" \
    --name private.googleapis.com \
    --ttl 300 \
    --type A \
    --zone private-google-apis

gcloud dns record-sets transaction execute --zone private-google-apis

Configuring private access to Container Registry

  1. Create a private DNS zone, a CNAME and an A record so that nodes can connect to Container Registry using Private Google Access and the gcr.io hostname:
gcloud dns managed-zones create private-gcr-io \
    --description "private zone for Container Registry" \
    --dns-name gcr.io \
    --visibility private \
    --networks vpc-network

gcloud dns record-sets transaction start --zone private-gcr-io

gcloud dns record-sets transaction add gcr.io. \
    --name "*.gcr.io" \
    --ttl 300 \
    --type CNAME \
    --zone private-gcr-io

gcloud dns record-sets transaction add "199.36.153.8" "199.36.153.9" "199.36.153.10" "199.36.153.11" \
    --name gcr.io \
    --ttl 300 \
    --type A \
    --zone private-gcr-io

gcloud dns record-sets transaction execute --zone private-gcr-io

Create a private GKE cluster

  1. Find the external IP address of your Cloud Shell so that you can add it to the list of networks that are allowed to access your cluster's API server:
SHELL_IP=$(dig TXT -4 +short @ns1.google.com o-o.myaddr.l.google.com)
  1. After a period of inactivity, the external IP address of your Cloud Shell VM can change. If that happens, you must update your cluster's list of authorized networks. Add the following command to your initialization script:
cat << 'EOF' >> ./init-egress-tutorial.sh
SHELL_IP=$(dig TXT -4 +short @ns1.google.com o-o.myaddr.l.google.com)
gcloud container clusters update cluster1 \
    --enable-master-authorized-networks \
    --master-authorized-networks ${SHELL_IP//\"}/32
EOF
  1. Enable the Google Kubernetes Engine API:
gcloud services enable container.googleapis.com
  1. Create a private GKE cluster
gcloud container clusters create cluster1 \
    --enable-ip-alias \
    --enable-private-nodes \
    --release-channel "regular" \
    --enable-master-authorized-networks \
    --master-authorized-networks ${SHELL_IP//\"}/32 \
    --master-ipv4-cidr 10.5.0.0/28 \
    --enable-dataplane-v2 \
    --service-account "sa-application-nodes@${PROJECT_ID}.iam.gserviceaccount.com" \
    --machine-type "e2-standard-4" \
    --network "vpc-network" \
    --subnetwork "subnet-gke" \
    --cluster-secondary-range-name "pods" \
    --services-secondary-range-name "services" \
    --workload-pool "${PROJECT_ID}.svc.id.goog" \
    --zone ${ZONE}
  1. Create a node pool called gateway. This node pool is where the egress gateway is deployed. The dedicated=gateway:NoSchedule taint is added to every node in the gateway node pool.
gcloud container node-pools create "gateway" \
    --cluster "cluster1" \
    --machine-type "e2-standard-4" \
    --node-taints dedicated=gateway:NoSchedule \
    --service-account "sa-gateway-nodes@${PROJECT_ID}.iam.gserviceaccount.com" \
    --num-nodes "1"
  1. Download credentials so that you can connect to the cluster with kubectl:
gcloud container clusters get-credentials cluster1
  1. Verify that the gateway nodes have the correct taint:
kubectl get nodes -l cloud.google.com/gke-nodepool=gateway -o yaml \
-o=custom-columns='name:metadata.name,taints:spec.taints[?(@.key=="dedicated")]'

Installing and setting up Anthos Service Mesh

Enable APIs

  1. Enable Mesh APIs
gcloud services enable mesh.googleapis.com \
      --project ${PROJECT_ID}

Enable Anthos Service Mesh and Joining cluster to the fleet

  1. Enable the Anthos Service Mesh fleet feature
gcloud container fleet mesh enable --project ${PROJECT_ID}
  1. Configure kubectl to point to the cluster.
gcloud container clusters get-credentials cluster1 \
     --zone ${ZONE} \
     --project ${PROJECT_ID}
  1. Get GKE URI
gcloud container clusters list --uri
  1. Register clusters to a fleet
gcloud container fleet memberships register antoshmesh \
  --gke-uri=${GKE_URI} \
  --enable-workload-identity \
  --project ${PROJECT_ID}
  1. Verify your cluster is registered:
gcloud container fleet memberships list --project ${PROJECT_ID}
  1. Apply the mesh_id label
FLEET_PROJECT_NUMBER=287832608307
gcloud container clusters update cluster1 --project ${PROJECT_ID} \
  --zone ${ZONE} --update-labels mesh_id=$proj-${FLEET_PROJECT_NUMBER}
  1. Enable automatic management
MEMBERSHIP_NAME=antoshmesh
REGION=asia-southeast1
gcloud container fleet mesh update \
   --management automatic \
   --memberships ${MEMBERSHIP_NAME} \
   --project ${PROJECT_ID} \
   --location ${REGION}
  1. Verify the control plane has been provisioned
gcloud container fleet mesh describe --project ${PROJECT_ID}

Install an egress gateway

  1. Create a Kubernetes namespace for the egress gateway:
kubectl create namespace istio-egress
  1. When you deploy the egress gateway, the configuration will be automatically injected based on a label you apply to the deployment or namespace. Verify the revision
kubectl -n istio-system get controlplanerevision
  1. Optional: Label the namespace so that the gateway configuration will be automatically injected.
REVISION=1
kubectl label namespace istio-egress istio.io/rev=${REVISION}
  1. Create an operator manifest for the egress gateway:
cat << EOF > egressgateway-operator.yaml
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: egressgateway-operator
  annotations:
    config.kubernetes.io/local-config: "true"
spec:
  profile: empty
  revision: REVISION
  components:
    egressGateways:
    - name: istio-egressgateway
      namespace: istio-egress
      enabled: true
  values:
    gateways:
      istio-egressgateway:
        injectionTemplate: gateway
        tolerations:
          - key: "dedicated"
            operator: "Equal"
            value: "gateway"
        nodeSelector:
          cloud.google.com/gke-nodepool: "gateway"
EOF
  1. Download and Install istioctl
# Download the Anthos Service Mesh installation file to your current working directory:
curl -LO https://storage.googleapis.com/gke-release/asm/istio-1.18.4-asm.0-linux-amd64.tar.gz

# Download the signature file and use openssl to verify the signature:
curl -LO https://storage.googleapis.com/gke-release/asm/istio-1.18.4-asm.0-linux-amd64.tar.gz.1.sig
openssl dgst -verify /dev/stdin -signature istio-1.18.4-asm.0-linux-amd64.tar.gz.1.sig istio-1.18.4-asm.0-linux-amd64.tar.gz <<'EOF'
-----BEGIN PUBLIC KEY-----
MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEWZrGCUaJJr1H8a36sG4UUoXvlXvZ
wQfk16sxprI2gOJ2vFFggdq3ixF2h4qNBt0kI7ciDhgpwS8t+/960IsIgw==
-----END PUBLIC KEY-----
EOF

# Extract the contents of the file to any location on your file system. 
tar xzf istio-1.18.4-asm.0-linux-amd64.tar.gz

# Ensure that you're in the Anthos Service Mesh installation's root directory.
cd istio-1.18.4-asm.0
  1. After extracting the downloaded archive, set an environment variable to hold the path to the istioctl tool and add it to your initialization script:
ISTIOCTL=$(find "$(pwd -P)" -name istioctl)
echo "ISTIOCTL=\"${ISTIOCTL}\"" >> ./init-egress-tutorial.sh
  1. Create the egress gateway installation manifest using the operator manifest and istioctl:
${ISTIOCTL} manifest generate \
    --filename egressgateway-operator.yaml \
    --output egressgateway \
    --cluster-specific
  1. Install the egress gateway:
kubectl apply --recursive --filename egressgateway.yml
  1. Check that the egress gateway is running on nodes in the gateway node pool:
kubectl get pods -n istio-egress -o wide
  1. The egress gateway pods have affinity for nodes in the gateway node pool and a toleration that lets them run on the tainted gateway nodes. Examine the node affinity and tolerations for the egress gateway pods:
kubectl -n istio-egress get pod -l istio=egressgateway \
    -o=custom-columns='name:metadata.name,node-affinity:spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms,tolerations:spec.tolerations[?(@.key=="dedicated")]'

Enable Envoy access logging

  1. Run the following command to add accessLogFile: /dev/stdout:
cat <<EOF | kubectl apply -f -
apiVersion: v1
data:
  mesh: |-
    accessLogFile: /dev/stdout
kind: ConfigMap
metadata:
  name: istio-release-channel
  namespace: istio-system
EOF
  1. Run the following command to view the configmap:
kubectl get configmap istio-release-channel -n istio-system -o yaml
  1. To verify that access logging is enabled, ensure the accessLogFile: /dev/stdout line appears in the mesh: section.
...
apiVersion: v1
data:
  mesh: |
    ....
    accessLogFile: /dev/stdout
...

Preparing the mesh and a test application

  1. Make sure that STRICT mutual TLS is enabled. Apply a default PeerAuthentication policy for the mesh in the istio-system namespace:
cat <<EOF | kubectl apply -f -
apiVersion: "security.istio.io/v1beta1"
kind: "PeerAuthentication"
metadata:
  name: "default"
  namespace: "istio-system"
spec:
  mtls:
    mode: STRICT
EOF
  1. Create namespaces to use for deploying test workloads. Later steps in this tutorial explain how to configure different egress routing rules for each namespace.
kubectl create namespace team-x
kubectl create namespace team-y
  1. Label the namespaces so that they can be selected by Kubernetes network policies:
kubectl label namespace team-x team=x
kubectl label namespace team-y team=y
  1. For Anthos Service Mesh to automatically inject proxy sidecars, you set the control plane revision label on the workload namespaces:
kubectl label ns team-x istio.io/rev=REVISION
kubectl label ns team-y istio.io/rev=REVISION
  1. Create a YAML file to use for making test deployments:
cat << 'EOF' > ./test.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: test
---
apiVersion: v1
kind: Service
metadata:
  name: test
  labels:
    app: test
spec:
  ports:
  - port: 80
    name: http
  selector:
    app: test
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      labels:
        app: test
    spec:
      serviceAccountName: test
      containers:
      - name: test
        image: gcr.io/google.com/cloudsdktool/cloud-sdk:slim
        command: ["/bin/sleep", "infinity"]
        imagePullPolicy: IfNotPresent
EOF
  1. Deploy the test application to the team-x namespace:
kubectl -n team-x create -f ./test.yaml
  1. Verify that the test application is deployed to a node in the default pool and that a proxy sidecar container is injected. Repeat the following command until the pod's status is Running:
kubectl -n team-x get po -l app=test -o wide
  1. Verify that it is not possible to make an HTTP request from the test container to an external site:
kubectl -n team-x exec -it \
    $(kubectl -n team-x get pod -l app=test -o jsonpath={.items..metadata.name}) \
    -c test -- curl -v http://example.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment