Skip to content

Instantly share code, notes, and snippets.

@jrknox1977
jrknox1977 / llm-util.py
Created May 29, 2024 17:15
This Python script demonstrates how to interact with multiple AI models from different providers using their respective APIs.
import os
from dotenv import load_dotenv
from openai import OpenAI
from groq import Groq
import anthropic
import google.generativeai as genai
# Use this pip install command:
# python3 -m pip install openai groq anthropic google-generativeai python-dotenv
@jrknox1977
jrknox1977 / multi_ollama_containers.md
Last active February 21, 2024 21:07
Running Multiple ollama containers on a single host.

Multiple Ollama Containers on a single host (with multiple GPUs)

I don't want model RELOAD

  • I have a large machine with 2 GPUs and a considerable amount of RAM.
  • I was trying to use ollama to server llava and mistral BUT it would reload the models every time I switched model requests.
  • So this is the solution that appears to be working: Multiple Containers, each serving a different model, on different ports.

Ollama model working dir:

  • I have many models already downloaded on my machine so I mount the host ollama working dir to the containers.
  • Linux (At least on my linux machine) - /usr/share/ollama/.ollama