Last active
February 15, 2024 19:21
-
-
Save bigsnarfdude/0e0a1b059ddcc560d6852af3a928eab5 to your computer and use it in GitHub Desktop.
local_llm.py using ollama on mac mps metal
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
self host an inference server with basic rag support: | |
- get a Mac Mini or Mac Studio - just run ollama serve, - run ollama web-ui in docker - add some coding assitant model from ollamahub with the web-ui - upload your documents in the web-ui | |
No code needed, you have your self hosted LLM with basic RAG giving you answers with your documents in context. For us the deepseek coder 33b model is fast enough on a Mac Studio with 64gb ram and can give pretty good suggestions based on our internal coding documentation. | |
''' | |
from openai import OpenAI | |
client = OpenAI( | |
base_url = 'http://localhost:11434/v1', | |
api_key='ollama', # required, but unused | |
) | |
response = client.chat.completions.create( | |
model="llama2", | |
messages=[ | |
{"role": "system", "content": "You are a helpful assistant."}, | |
{"role": "user", "content": "Who won the world series in 2020?"}, | |
{"role": "assistant", "content": "The LA Dodgers won in 2020."}, | |
{"role": "user", "content": "Where was it played?"} | |
] | |
) | |
print(response.choices[0].message.content) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment