RockAfeller2013

## llm-util.py
import os
from dotenv import load_dotenv
from openai import OpenAI
from groq import Groq
import anthropic
import google.generativeai as genai

# Use this pip install command:
# python3 -m pip install openai groq anthropic google-generativeai python-dotenv

## multi_ollama_containers.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                RockAfeller2013
                / multi_ollama_containers.md
            
            
              Created
              February 21, 2024 21:03
                — forked from jrknox1977/multi_ollama_containers.md
            
              
                Running Multiple ollama containers on a single host. 
              
          
    Multiple Ollama Containers on a single host (with multiple GPUs)

I don't want model RELOAD


I have a large machine with 2 GPUs and a considerable amount of RAM.
I was trying to use ollama to server llava and mistral BUT it would reload the models every time I switched model requests.
So this is the solution that appears to be working: Multiple Containers, each serving a different model, on different ports.

Ollama model working dir:


I have many models already downloaded on my machine so I mount the host ollama working dir to the containers.
Linux (At least on my linux machine) - /usr/share/ollama/.ollama
	import os
	from dotenv import load_dotenv
	from openai import OpenAI
	from groq import Groq
	import anthropic
	import google.generativeai as genai

	# Use this pip install command:
	# python3 -m pip install openai groq anthropic google-generativeai python-dotenv