Skip to content

Instantly share code, notes, and snippets.

@romilbhardwaj
Last active July 11, 2024 02:15
Show Gist options
  • Save romilbhardwaj/b5b6b893e7a3749a2815f055f3f5351c to your computer and use it in GitHub Desktop.
Save romilbhardwaj/b5b6b893e7a3749a2815f055f3f5351c to your computer and use it in GitHub Desktop.
Serve Gemma with SkyPilot
envs:
MODEL_NAME: google/gemma-2b-it
HF_TOKEN: # TODO: Fill with your own huggingface token, or use --env to pass.
resources:
image_id: docker:vllm/vllm-openai:latest
accelerators: L4:1
ports: 8000
setup: |
conda deactivate
python3 -c "import huggingface_hub; huggingface_hub.login('${HF_TOKEN}')"
run: |
conda deactivate
echo 'Starting vllm openai api server...'
python -m vllm.entrypoints.openai.api_server \
--model $MODEL_NAME --tokenizer hf-internal-testing/llama-tokenizer \
--host 0.0.0.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment