Skip to content

Instantly share code, notes, and snippets.

@steren
Last active April 29, 2024 05:32
Show Gist options
  • Save steren/3a270e0014cbaa9d298539d4f52e3e5a to your computer and use it in GitHub Desktop.
Save steren/3a270e0014cbaa9d298539d4f52e3e5a to your computer and use it in GitHub Desktop.
llamafile container image
FROM debian:latest
RUN apt-get update && apt-get install -y wget
# Update this to the URL pointing at the llamafile you want to run.
# Find other models at https://github.com/Mozilla-Ocho/llamafile?tab=readme-ov-file#other-example-llamafiles
ENV LLAMAFILE_DOWNLOAD_URL="https://huggingface.co/jartine/Mistral-7B-Instruct-v0.2-llamafile/resolve/main/mistral-7b-instruct-v0.2.Q4_0.llamafile?download=true"
# Download the llamafile and make it executable
RUN wget $LLAMAFILE_DOWNLOAD_URL -O ./model.llamafile && chmod +x ./model.llamafile
# Use the llamafile executable as container start command
#ENTRYPOINT ["./model.llamafile"]
# Use GPU maximize the number of layers sent to GPU, listen on 0.0.0.0, do not attempt to start a browser
#CMD ["--gpu", "nvidia", "-ngl", "9999","--host", "0.0.0.0", "--nobrowser"]
# TODO use proper ENTRYPOINT and CMD
ENTRYPOINT ./model.llamafile --gpu nvidia -ngl 9999 --host 0.0.0.0 --nobrowser
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment