jsjolund/Dockerfile

## README.md

      
    Raw
  

              README.md
            
          
    lollms-webui with Nvidia cuBLAS support in Docker

lollms-webui is a web interface for hosting Large Language Models (LLMs) using many different models and bindings.
This Dockerfile installs lolms and lollms-webui as libraries in a docker image.
The Dockerfile is based on nvidia/cuda with Ubuntu and cuDNN. It should be used with the NVIDIA Container Toolkit to enable GPU support in docker.
Build

Build the image locally:
docker build -t lollms-webui:0.0.1 .
Remember to clean up the build cache if rebuilding, to get the latest git versions.
Configure

Create a cache directory:
mkdir -p ~/.cache/lollms
It will be used for storing LLMs and configuration files.
Download a model supporting the new (as of Jun 2023) k-quant methods in llama.cpp, for example

Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_K_S.bin

and place it in the cache directory ~/.cache/lollms/models/llama_cpp_official/.
Starting

Run the container:
docker run --rm -it --gpus all -v ~/.cache/lollms:/cache \
  -p 8080:8080 --name lollms lollms-webui:0.0.1
Option -e CPU_THREADS=4 can be used to limit the number of CPU threads used by LLMs, otherwise, all available threads will be used.
Option --entrypoint bash can be used to start a shell instead of the web interface.
The web interface will be available at http://localhost:8080.

  
## Dockerfile
FROM nvcr.io/nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04

# Set environment variables
ENV DEBIAN_FRONTEND=noninteractive\
    SHELL=/bin/bash\
    PATH="/opt/venv/bin:$PATH"

RUN echo "en_US.UTF-8 UTF-8" > /etc/locale.gen

# Install apt packages
RUN apt-get update && apt-get upgrade -y
RUN apt-get install -y --no-install-recommends \
    git wget curl build-essential \
    python3-dev python3-venv python3-pip python-is-python3
RUN apt-get clean

# Create a virtual Python environment
RUN python3 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

# Upgrade pip
RUN pip install --upgrade pip setuptools wheel virtualenv

# Install Torch with CUDA 12.1 support
RUN pip install torch torchvision torchaudio --pre -f https://download.pytorch.org/whl/nightly/cu121/torch_nightly.html

# Install llama python bindings from source
RUN pip install git+https://github.com/abetlen/llama-cpp-python.git

# Install lollms and lollms-webui in editable mode for debugging
RUN mkdir /src
RUN git clone --recurse-submodules https://github.com/ParisNeo/lollms.git /src/lollms
RUN cd /src/lollms \
    && pip install -r requirements.txt -e .

RUN git clone --recurse-submodules https://github.com/ParisNeo/lollms-webui.git /src/lollms-webui
RUN cd /src/lollms-webui \
    && pip install -r requirements.txt -e .

# Entrypoint
ADD start.sh /
RUN chmod +x /start.sh
CMD [ "/start.sh" ]

## start.sh
#!/bin/bash

# Show commands being ran
set -x

# Check if the CPU limit is set
[[ ! $CPU_THREADS ]] && CPU_THREADS=$(nproc)

# Configure lollms paths for the webui
[[ ! -f /src/lollms-webui/global_paths_cfg.yaml ]] && mkdir -p /src/lollms-webui && printf 'lollms_path: /src/lollms/lollms\nlollms_personal_path: /cache' > /src/lollms-webui/global_paths_cfg.yaml

# Create download directories for the models
mkdir -p /cache/models/{py_llama_cpp,c_transformers,llama_cpp_official,binding_template,gpt_j_m,gpt_4all,open_ai,gpt_j_a,gptq,hugging_face} || true

# Activate the python environment and start the web server
source /opt/venv/bin/activate
cd /src/lollms-webui
python app.py -m Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_K_S.bin --host 0.0.0.0 --port 8080 --n_threads "$CPU_THREADS"
	FROM nvcr.io/nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04

	# Set environment variables
	ENV DEBIAN_FRONTEND=noninteractive\
	SHELL=/bin/bash\
	PATH="/opt/venv/bin:$PATH"

	RUN echo "en_US.UTF-8 UTF-8" > /etc/locale.gen

	# Install apt packages
	RUN apt-get update && apt-get upgrade -y
	RUN apt-get install -y --no-install-recommends \
	git wget curl build-essential \
	python3-dev python3-venv python3-pip python-is-python3
	RUN apt-get clean

	# Create a virtual Python environment
	RUN python3 -m venv /opt/venv
	ENV PATH="/opt/venv/bin:$PATH"

	# Upgrade pip
	RUN pip install --upgrade pip setuptools wheel virtualenv

	# Install Torch with CUDA 12.1 support
	RUN pip install torch torchvision torchaudio --pre -f https://download.pytorch.org/whl/nightly/cu121/torch_nightly.html

	# Install llama python bindings from source
	RUN pip install git+https://github.com/abetlen/llama-cpp-python.git

	# Install lollms and lollms-webui in editable mode for debugging
	RUN mkdir /src
	RUN git clone --recurse-submodules https://github.com/ParisNeo/lollms.git /src/lollms
	RUN cd /src/lollms \
	&& pip install -r requirements.txt -e .

	RUN git clone --recurse-submodules https://github.com/ParisNeo/lollms-webui.git /src/lollms-webui
	RUN cd /src/lollms-webui \
	&& pip install -r requirements.txt -e .

	# Entrypoint
	ADD start.sh /
	RUN chmod +x /start.sh
	CMD [ "/start.sh" ]
	#!/bin/bash

	# Show commands being ran
	set -x

	# Check if the CPU limit is set
	[[ ! $CPU_THREADS ]] && CPU_THREADS=$(nproc)

	# Configure lollms paths for the webui
	[[ ! -f /src/lollms-webui/global_paths_cfg.yaml ]] && mkdir -p /src/lollms-webui && printf 'lollms_path: /src/lollms/lollms\nlollms_personal_path: /cache' > /src/lollms-webui/global_paths_cfg.yaml

	# Create download directories for the models
	mkdir -p /cache/models/{py_llama_cpp,c_transformers,llama_cpp_official,binding_template,gpt_j_m,gpt_4all,open_ai,gpt_j_a,gptq,hugging_face} \|\| true

	# Activate the python environment and start the web server
	source /opt/venv/bin/activate
	cd /src/lollms-webui
	python app.py -m Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_K_S.bin --host 0.0.0.0 --port 8080 --n_threads "$CPU_THREADS"