Skip to content

Instantly share code, notes, and snippets.

@bigsnarfdude
Last active February 15, 2024 19:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bigsnarfdude/0e0a1b059ddcc560d6852af3a928eab5 to your computer and use it in GitHub Desktop.
Save bigsnarfdude/0e0a1b059ddcc560d6852af3a928eab5 to your computer and use it in GitHub Desktop.
local_llm.py using ollama on mac mps metal
'''
self host an inference server with basic rag support:
- get a Mac Mini or Mac Studio - just run ollama serve, - run ollama web-ui in docker - add some coding assitant model from ollamahub with the web-ui - upload your documents in the web-ui
No code needed, you have your self hosted LLM with basic RAG giving you answers with your documents in context. For us the deepseek coder 33b model is fast enough on a Mac Studio with 64gb ram and can give pretty good suggestions based on our internal coding documentation.
'''
from openai import OpenAI
client = OpenAI(
base_url = 'http://localhost:11434/v1',
api_key='ollama', # required, but unused
)
response = client.chat.completions.create(
model="llama2",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{"role": "assistant", "content": "The LA Dodgers won in 2020."},
{"role": "user", "content": "Where was it played?"}
]
)
print(response.choices[0].message.content)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment