Skip to content

Instantly share code, notes, and snippets.

Last active May 5, 2024 15:03
Show Gist options
  • Save Jonpro03/604430a3e64735a0a9df6b7e385d15be to your computer and use it in GitHub Desktop.
Save Jonpro03/604430a3e64735a0a9df6b7e385d15be to your computer and use it in GitHub Desktop.
Dockerfile to run NVIDIA compute on a serge-chat docker container
Updated to serge 0.9.0 and cuda 12.4.1
Ready-to-run images available here:
Tested with a Quadro P2000 on Ubuntu 22.04
Install Docker
Install Nvidia drivers 535
Install Cuda 12.4.1
Install nvidia-container-toolkit (follow instructions for WSL2 if using Windows)
Launch with --gpus all flag, like:
docker run --gpus all -v weights:/usr/src/app/weights -v datadb:/data/db/ -p 8008:8008 jonpro03/cuda_serge:latest
The dockerfile is configured to be fairly backwards compatibile with older systems. You might test turning F16C/FMA/AVX2 back on and seeing if it compiles on your system. This is the _compat tag.
If it's working, you should see something similar in the container stdout
ggml_init_cublas: found 1 CUDA devices:
Device 0: Quadro P2000, compute capability 6.1
# ---------------------------------------
# Base image for redis
FROM redis:7-bookworm as redis
# ---------------------------------------
# Build frontend
FROM node:20-bookworm-slim as frontend
WORKDIR /usr/src/app
COPY ./web/package.json ./web/package-lock.json ./
RUN npm ci
COPY ./web /usr/src/app/web/
WORKDIR /usr/src/app/web/
RUN npm run build
# ---------------------------------------
# Build environment
FROM nvidia/cuda:12.4.1-devel-ubuntu22.04 as build
# Set ENV
# optional extras might be req'd for older cpus: -DLLAMA_NATIVE=OFF -DLLAMA_AVX=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF"
# Install api dependencies
RUN apt-get update \
&& apt-get install -y --no-install-recommends python3 python3-pip build-essential
# Install llama-cpp-python
RUN pip wheel llama-cpp-python --wheel-dir=/usr/local/bin/wheels
# ---------------------------------------
# Runtime environment
FROM nvidia/cuda:12.4.1-runtime-ubuntu22.04 as release
# Set ENV
ENV NODE_ENV='production'
WORKDIR /usr/src/app
# Copy artifacts
COPY --from=redis /usr/local/bin/redis-server /usr/local/bin/redis-server
COPY --from=redis /usr/local/bin/redis-cli /usr/local/bin/redis-cli
COPY --from=frontend /usr/src/app/web/build /usr/src/app/api/static/
COPY --from=build /usr/local/bin/wheels /usr/local/bin/wheels
COPY ./api /usr/src/app/api
COPY scripts/ /usr/src/app/
COPY scripts/serge.env /usr/src/app/serge.env
COPY vendor/requirements.txt /usr/src/app/requirements.txt
# Install api dependencies
RUN apt-get update \
&& apt-get install -y --no-install-recommends python3 python3-pip python-is-python3 libxml2 build-essential dumb-init \
&& pip install --no-cache-dir ./api \
&& apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* \
&& chmod 755 /usr/src/app/ \
&& chmod 755 /usr/local/bin/redis-server \
&& chmod 755 /usr/local/bin/redis-cli \
&& mkdir -p /etc/redis \
&& mkdir -p /data/db \
&& mkdir -p /usr/src/app/weights \
&& echo "appendonly yes" >> /etc/redis/redis.conf \
&& echo "dir /data/db/" >> /etc/redis/redis.conf
RUN pip install --no-index --find-links=/usr/local/bin/wheels llama-cpp-python
ENTRYPOINT ["/usr/bin/dumb-init", "--"]
CMD ["/bin/bash", "-c", "/usr/src/app/"]
Copy link

Aeyk commented Nov 4, 2023

Why did redis get replaced with dragonfly, is that necessary to get CUDA working in this container?

Copy link

gaby commented Dec 4, 2023

@Aeyk Changes are being tracked here serge-chat/serge#43 support for gpu is coming out this week. Dragonfly was removed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment