Skip to content

Instantly share code, notes, and snippets.

@garrett
Created June 25, 2024 18:43
Show Gist options
  • Save garrett/76020dd1655e587031c9e6a4a8f7c326 to your computer and use it in GitHub Desktop.
Save garrett/76020dd1655e587031c9e6a4a8f7c326 to your computer and use it in GitHub Desktop.
ollama with rocm (AMD GPU) and webui, in podman
podman run --pull newer --detach --security-opt label=type:container_runtime_t --replace --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:rocm; podman run --replace --pull newer -d --network=host -v open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://127.0.0.1:11434 --name open-webui --restart always ghcr.io/open-webui/open-webui:main
@garrett
Copy link
Author

garrett commented Jun 25, 2024

To run it without GPU acceleration, you need to remove a few things in the first podman command, so it looks like this instead:

podman run --pull newer --detach --replace -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama; podman run --replace --pull newer -d --network=host -v open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://127.0.0.1:11434 --name open-webui --restart always ghcr.io/open-webui/open-webui:main

(Basically removing the SELinux permission stuff, the GPU device paths, and the :rocm tag from the container. The open-webui part stays the same.)

Depending on your GPU, you can use various models. They'll all be pretty quick. On CPU, it'll be slower, but smaller ones are decent, like mistral, phi3, or granite. When downloading models, it'll take a while depending on the size. The smaller ones are quicker and require less RAM (either vRAM for GPU or RAM for CPU).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment