Skip to content

Instantly share code, notes, and snippets.

@yourbuddyconner
Created June 19, 2024 19:06
Show Gist options
  • Save yourbuddyconner/1a6cb1479a284a1cc9576704727f0c5b to your computer and use it in GitHub Desktop.
Save yourbuddyconner/1a6cb1479a284a1cc9576704727f0c5b to your computer and use it in GitHub Desktop.
Skypilot Llamacpp Skypilot Config
# service.yaml
service:
readiness_probe: /v1/models
replicas: 1
# Fields below describe each replica.
resources:
ports: 8000
cpus: 4+
accelerators: {A100:1}
setup: |
conda create -n llamacpp python=3.9 -y
conda activate llamacpp
CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python[server]
mkdir -p $(pwd)/models
wget https://huggingface.co/TheBloke/OpenHermes-2.5-Mistral-7B-GGUF/resolve/main/openhermes-2.5-mistral-7b.Q8_0.gguf?download=true -O $(pwd)/models/model.gguf
run: |
conda activate llamacpp
python3 -m llama_cpp.server --model=$(pwd)/models/model.gguf \
--host=0.0.0.0 --chat_format=chatml --logits_all=True \
--tensor_split=$SKYPILOT_NUM_GPUS_PER_NODE \
--n_gpu_layers=33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment