Skip to content

Instantly share code, notes, and snippets.

@christian-posta
Created January 22, 2025 18:49
Show Gist options
  • Save christian-posta/a3024eaabec39c754d0dc91d8bdaa946 to your computer and use it in GitHub Desktop.
Save christian-posta/a3024eaabec39c754d0dc91d8bdaa946 to your computer and use it in GitHub Desktop.
Completely Made Up Instructions for NVIDIA NIM + GKE

Here's a step-by-step guide to create a cost-conscious Kubernetes cluster in Google Kubernetes Engine (GKE), configure nodes with GPUs, and set up NVIDIA NGC Infrastructure Manager (NIM) along with deploying an LLM that uses the OpenAI API.


Step 1: Prerequisites

  1. Google Cloud Account: Ensure you have an active Google Cloud account.
  2. gcloud CLI: Install the Google Cloud SDK.
  3. kubectl: Install kubectl if it's not already installed.
  4. NVIDIA GPU Driver Support: Ensure you have access to NVIDIA resources and APIs.

Step 2: Enable Required APIs and Create a Project

  1. Set your project:

    gcloud config set project [PROJECT_ID]
  2. Enable GKE and Compute Engine APIs:

    gcloud services enable container.googleapis.com compute.googleapis.com
  3. Create a service account key (optional, for programmatic access):

    gcloud iam service-accounts keys create key.json \
        --iam-account [SERVICE_ACCOUNT]@[PROJECT_ID].iam.gserviceaccount.com

Step 3: Create the GKE Cluster

  1. Create the Kubernetes cluster:

    gcloud container clusters create nim-cluster \
        --zone us-central1-a \
        --num-nodes 3 \
        --machine-type e2-standard-2 \
        --cluster-version latest
  2. Add a GPU node pool:

    gcloud container node-pools create gpu-pool \
        --cluster nim-cluster \
        --zone us-central1-a \
        --accelerator type=nvidia-tesla-t4,count=1 \
        --machine-type n1-standard-4 \
        --num-nodes 1 \
        --node-labels purpose=gpu \
        --node-taints gpu=true:NoSchedule
  3. Install NVIDIA GPU drivers on GPU nodes:

    kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/stable/deployments/k8s-device-plugin.yaml

Step 4: Install the NVIDIA NGC Infrastructure Manager (NIM)

  1. Add the Helm repository for NVIDIA:

    helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
        && helm repo update
  2. Install the NIM operator:

    helm install nim-operator nvidia/nim-operator --namespace nvidia \
        --create-namespace
  3. Verify installation:

    kubectl get pods -n nvidia

Step 5: Deploy an LLM Using OpenAI API

  1. Deploy a sample application:

    Create a deployment file (llm-deployment.yaml):

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: openai-llm
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: openai-llm
      template:
        metadata:
          labels:
            app: openai-llm
        spec:
          containers:
          - name: openai-llm
            image: python:3.9-slim
            command: ["python"]
            args: ["-m", "http.server", "8000"]
            resources:
              limits:
                nvidia.com/gpu: 1
            env:
            - name: OPENAI_API_KEY
              value: "<your-openai-api-key>"

    Apply the deployment:

    kubectl apply -f llm-deployment.yaml
  2. Expose the service:

    kubectl expose deployment openai-llm --type=LoadBalancer --name=openai-service
  3. Verify the service is accessible:

    kubectl get svc

    Note the external IP of the service.


Step 6: Test the Setup

  1. Install curl on your local machine (if not already installed).
  2. Test the deployed LLM service:
    curl -X POST http://<EXTERNAL_IP>:8000/inference \
        -H "Authorization: Bearer <your-openai-api-key>" \
        -d '{"prompt": "What is Kubernetes?", "max_tokens": 100}'

Step 7: Monitor Costs

  • Use the Google Cloud Console to monitor the cost of your GPU node.
  • Shut down or scale the GPU node pool when not in use:
    gcloud container clusters resize nim-cluster \
        --node-pool gpu-pool \
        --num-nodes 0 --zone us-central1-a

Additional Notes

  • You can adapt this workflow to install specific NIM models by updating the Helm values file.
  • Replace placeholders like <your-openai-api-key> and <EXTERNAL_IP> with your actual values.
  • For more advanced configurations, refer to the NVIDIA documentation.

Let me know if you need any further assistance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment