Skip to content

Instantly share code, notes, and snippets.

@nafeu
Last active January 24, 2024 18:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nafeu/a19137e91f413d7ffeb43481d913999b to your computer and use it in GitHub Desktop.
Save nafeu/a19137e91f413d7ffeb43481d913999b to your computer and use it in GitHub Desktop.
Presentation on running setting up Ollama - nafeu.com/minimal-elearning
---
title: "Setting up a local LLM with Ollama"
author: "Nafeu Nasir"
description: "Getting up and running with Ollama and integrating it into your workflow."
date: 2024-01-24
---
# What is a local large language model? (LLM)
You've heard of LLMs, you've heard of ChatGPT, or Bard or Grok. Well a local LLM is one you can run on your computer. This means that you don't need to rely on a cloud service to use them (better data privacy and security).
<iframe src="https://giphy.com/embed/qAtZM2gvjWhPjmclZE" width="480" height="296" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/southpark-south-park-deep-learning-s26e4-qAtZM2gvjWhPjmclZE">via GIPHY</a></p>
+++
# What is Ollama?
Ollama allows you to run open-source large language models, such as Llama 2, locally.
Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile.
It optimizes setup and configuration details, including GPU usage.
<div align="center">
<picture>
<source media="(prefers-color-scheme: dark)" height="200px" srcset="https://github.com/jmorganca/ollama/assets/3325447/56ea1849-1284-4645-8970-956de6e51c3c">
<img alt="logo" height="200px" src="https://github.com/jmorganca/ollama/assets/3325447/0d0b44e2-8f4a-4e99-9b52-a5c1c741c8f7">
</picture>
</div>
+++
# What kind of computer does it require?
Ollama generally supports machines with 8GB of memory (preferably VRAM). Mac and Linux machines are both supported – although on Linux you'll need an Nvidia GPU right now for GPU acceleration.
You'll see a noticable difference with GPU acceleration or running on Apple Silicon versus a non-Apple Silicon CPU.
Model's are optimized to run at 2/3 or 1/2 of total machine ram but the more ram allocation the better performance.
So if you have `16gb` you'll probably allocate `8gb` for the model, 32gb -> 16gb for the model, 64gb -> 32gb for the model, etc.
<iframe src="https://giphy.com/embed/P7JmDW7IkB7TW" width="480" height="328" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/one-direction-zayn-malik-abhhhhhhhhhhhhhhhhhhhhhhhh-P7JmDW7IkB7TW">via GIPHY</a></p>
+++
# How do you set it up?
### macOS
Visit Ollama's website `ollama.ai` and hit download.
### Windows
Coming soon! For now, you can install Ollama on Windows via WSL2.
### Linux & WSL2
```
curl https://ollama.ai/install.sh | sh
```
<iframe src="https://giphy.com/embed/7hJZcKzjIufeOmqKSj" width="480" height="356" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/grandma-grandmother-offline-granny-7hJZcKzjIufeOmqKSj">via GIPHY</a></p>
+++
# Quickstart
To run and chat with [Llama 2](https://ollama.ai/library/llama2):
```
ollama run llama2
```
<iframe src="https://giphy.com/embed/gGldiUgAUOJ04g1Ves" width="480" height="480" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/europeanathletics-race-mascot-start-gGldiUgAUOJ04g1Ves">via GIPHY</a></p>
+++
# What are some of the models available?
Ollama supports a list of open-source models available on [ollama.ai/library](https://ollama.ai/library 'ollama model library')
Here are some example open-source models that can be downloaded:
| Model | Parameters | Size | Download |
| ------------------ | ---------- | ----- | ------------------------------ |
| Llama 2 | 7B | 3.8GB | `ollama run llama2` |
| Mistral | 7B | 4.1GB | `ollama run mistral` |
| Dolphin Phi | 2.7B | 1.6GB | `ollama run dolphin-phi` |
| Phi-2 | 2.7B | 1.7GB | `ollama run phi` |
| Neural Chat | 7B | 4.1GB | `ollama run neural-chat` |
| Starling | 7B | 4.1GB | `ollama run starling-lm` |
| Code Llama | 7B | 3.8GB | `ollama run codellama` |
| Llama 2 Uncensored | 7B | 3.8GB | `ollama run llama2-uncensored` |
| Llama 2 13B | 13B | 7.3GB | `ollama run llama2:13b` |
| Llama 2 70B | 70B | 39GB | `ollama run llama2:70b` |
| Orca Mini | 3B | 1.9GB | `ollama run orca-mini` |
| Vicuna | 7B | 3.8GB | `ollama run vicuna` |
| LLaVA | 7B | 4.5GB | `ollama run llava` |
> Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
+++
# How can I customize the model with prompts?
Create a `Modelfile`:
```
FROM llama2
# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
# set the system message
SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
"""
```
Next, create and run the model:
```
ollama create mario -f ./Modelfile
ollama run mario
>>> hi
Hello! It's your friend Mario.
```
+++
# Some cool features:
### Multiline input
For multiline input, you can wrap text with `"""`:
```
>>> """Hello,
... world!
... """
I'm a basic program that prints the famous "Hello, world!" message to the console.
```
### Multimodal models
```
>>> What's in this image? /Users/jmorgan/Desktop/smile.png
The image features a yellow smiley face, which is likely the central focus of the picture.
```
### Pass in prompt as arguments
```
$ ollama run llama2 "Summarize this file: $(cat README.md)"
Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.
```
+++
# How can I interact with the model outside of the terminal?
## REST API
Ollama ships with a REST API for running and managing models. just use `ollama serve`, instead of `ollama run` (but it should be running anyways even with `ollama run`):
### Generate a response
```
curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt":"Why is the sky blue?"
}'
```
### Chat with a model
```
curl http://localhost:11434/api/chat -d '{
"model": "mistral",
"messages": [
{ "role": "user", "content": "why is the sky blue?" }
]
}'
```
+++
# Check out Fireship.io's great video on running Mistral's 8x7B Model
<iframe width="560" height="315" src="https://www.youtube.com/embed/GyllRd2E6fg?si=_tG71hDNLswF1CSX" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
+++
# Let's play with the model `code llama`
<iframe src="https://giphy.com/embed/vIqU5gwdCPKO4" width="480" height="270" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/llama-vIqU5gwdCPKO4">via GIPHY</a></p>
# What kind of editor integrations are there?
VSCode
[continue](https://github.com/continuedev/continue)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment