Skip to content

Instantly share code, notes, and snippets.

@jsjolund
Last active June 6, 2024 00:37
Show Gist options
  • Save jsjolund/c03089becae815ad6cdd863d1a3f20d4 to your computer and use it in GitHub Desktop.
Save jsjolund/c03089becae815ad6cdd863d1a3f20d4 to your computer and use it in GitHub Desktop.
Dockerfile for running lollms-webui with Nvidia cuBLAS support

lollms-webui with Nvidia cuBLAS support in Docker

lollms-webui is a web interface for hosting Large Language Models (LLMs) using many different models and bindings.

This Dockerfile installs lolms and lollms-webui as libraries in a docker image.

The Dockerfile is based on nvidia/cuda with Ubuntu and cuDNN. It should be used with the NVIDIA Container Toolkit to enable GPU support in docker.

Build

Build the image locally:

docker build -t lollms-webui:0.0.1 .

Remember to clean up the build cache if rebuilding, to get the latest git versions.

Configure

Create a cache directory:

mkdir -p ~/.cache/lollms

It will be used for storing LLMs and configuration files.

Download a model supporting the new (as of Jun 2023) k-quant methods in llama.cpp, for example

and place it in the cache directory ~/.cache/lollms/models/llama_cpp_official/.

Starting

Run the container:

docker run --rm -it --gpus all -v ~/.cache/lollms:/cache \
  -p 8080:8080 --name lollms lollms-webui:0.0.1

Option -e CPU_THREADS=4 can be used to limit the number of CPU threads used by LLMs, otherwise, all available threads will be used.

Option --entrypoint bash can be used to start a shell instead of the web interface.

The web interface will be available at http://localhost:8080.

FROM nvcr.io/nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04
# Set environment variables
ENV DEBIAN_FRONTEND=noninteractive\
SHELL=/bin/bash\
PATH="/opt/venv/bin:$PATH"
RUN echo "en_US.UTF-8 UTF-8" > /etc/locale.gen
# Install apt packages
RUN apt-get update && apt-get upgrade -y
RUN apt-get install -y --no-install-recommends \
git wget curl build-essential \
python3-dev python3-venv python3-pip python-is-python3
RUN apt-get clean
# Create a virtual Python environment
RUN python3 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# Upgrade pip
RUN pip install --upgrade pip setuptools wheel virtualenv
# Install Torch with CUDA 12.1 support
RUN pip install torch torchvision torchaudio --pre -f https://download.pytorch.org/whl/nightly/cu121/torch_nightly.html
# Install llama python bindings from source
RUN pip install git+https://github.com/abetlen/llama-cpp-python.git
# Install lollms and lollms-webui in editable mode for debugging
RUN mkdir /src
RUN git clone --recurse-submodules https://github.com/ParisNeo/lollms.git /src/lollms
RUN cd /src/lollms \
&& pip install -r requirements.txt -e .
RUN git clone --recurse-submodules https://github.com/ParisNeo/lollms-webui.git /src/lollms-webui
RUN cd /src/lollms-webui \
&& pip install -r requirements.txt -e .
# Entrypoint
ADD start.sh /
RUN chmod +x /start.sh
CMD [ "/start.sh" ]
#!/bin/bash
# Show commands being ran
set -x
# Check if the CPU limit is set
[[ ! $CPU_THREADS ]] && CPU_THREADS=$(nproc)
# Configure lollms paths for the webui
[[ ! -f /src/lollms-webui/global_paths_cfg.yaml ]] && mkdir -p /src/lollms-webui && printf 'lollms_path: /src/lollms/lollms\nlollms_personal_path: /cache' > /src/lollms-webui/global_paths_cfg.yaml
# Create download directories for the models
mkdir -p /cache/models/{py_llama_cpp,c_transformers,llama_cpp_official,binding_template,gpt_j_m,gpt_4all,open_ai,gpt_j_a,gptq,hugging_face} || true
# Activate the python environment and start the web server
source /opt/venv/bin/activate
cd /src/lollms-webui
python app.py -m Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_K_S.bin --host 0.0.0.0 --port 8080 --n_threads "$CPU_THREADS"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment