Skip to content

Instantly share code, notes, and snippets.

@RockAfeller2013
RockAfeller2013 / llm-util.py
Created May 29, 2024 23:17 — forked from jrknox1977/llm-util.py
This Python script demonstrates how to interact with multiple AI models from different providers using their respective APIs.
import os
from dotenv import load_dotenv
from openai import OpenAI
from groq import Groq
import anthropic
import google.generativeai as genai
# Use this pip install command:
# python3 -m pip install openai groq anthropic google-generativeai python-dotenv
@RockAfeller2013
RockAfeller2013 / multi_ollama_containers.md
Created February 21, 2024 21:03 — forked from jrknox1977/multi_ollama_containers.md
Running Multiple ollama containers on a single host.

Multiple Ollama Containers on a single host (with multiple GPUs)

I don't want model RELOAD

  • I have a large machine with 2 GPUs and a considerable amount of RAM.
  • I was trying to use ollama to server llava and mistral BUT it would reload the models every time I switched model requests.
  • So this is the solution that appears to be working: Multiple Containers, each serving a different model, on different ports.

Ollama model working dir:

  • I have many models already downloaded on my machine so I mount the host ollama working dir to the containers.
  • Linux (At least on my linux machine) - /usr/share/ollama/.ollama