nafeu/setup-ollama-minimal-elearning.memd

## setup-ollama-minimal-elearning.memd
---
title: "Setting up a local LLM with Ollama"
author: "Nafeu Nasir"
description: "Getting up and running with Ollama and integrating it into your workflow."
date: 2024-01-24
---

# What is a local large language model? (LLM)

You've heard of LLMs, you've heard of ChatGPT, or Bard or Grok. Well a local LLM is one you can run on your computer. This means that you don't need to rely on a cloud service to use them (better data privacy and security).

<iframe src="https://giphy.com/embed/qAtZM2gvjWhPjmclZE" width="480" height="296" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/southpark-south-park-deep-learning-s26e4-qAtZM2gvjWhPjmclZE">via GIPHY</a></p>

+++

# What is Ollama?

Ollama allows you to run open-source large language models, such as Llama 2, locally.

Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile.

It optimizes setup and configuration details, including GPU usage.

<div align="center">
  <picture>
    <source media="(prefers-color-scheme: dark)" height="200px" srcset="https://github.com/jmorganca/ollama/assets/3325447/56ea1849-1284-4645-8970-956de6e51c3c">
    <img alt="logo" height="200px" src="https://github.com/jmorganca/ollama/assets/3325447/0d0b44e2-8f4a-4e99-9b52-a5c1c741c8f7">
  </picture>
</div>

+++

# What kind of computer does it require?

Ollama generally supports machines with 8GB of memory (preferably VRAM). Mac and Linux machines are both supported – although on Linux you'll need an Nvidia GPU right now for GPU acceleration.

You'll see a noticable difference with GPU acceleration or running on Apple Silicon versus a non-Apple Silicon CPU.

Model's are optimized to run at 2/3 or 1/2 of total machine ram but the more ram allocation the better performance.

So if you have `16gb` you'll probably allocate `8gb` for the model, 32gb -> 16gb for the model, 64gb -> 32gb for the model, etc.

<iframe src="https://giphy.com/embed/P7JmDW7IkB7TW" width="480" height="328" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/one-direction-zayn-malik-abhhhhhhhhhhhhhhhhhhhhhhhh-P7JmDW7IkB7TW">via GIPHY</a></p>

+++

# How do you set it up?


### macOS

Visit Ollama's website `ollama.ai` and hit download.

### Windows

Coming soon! For now, you can install Ollama on Windows via WSL2.

### Linux & WSL2

```
curl https://ollama.ai/install.sh | sh
```

<iframe src="https://giphy.com/embed/7hJZcKzjIufeOmqKSj" width="480" height="356" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/grandma-grandmother-offline-granny-7hJZcKzjIufeOmqKSj">via GIPHY</a></p>

+++

# Quickstart

To run and chat with [Llama 2](https://ollama.ai/library/llama2):

```
ollama run llama2
```

<iframe src="https://giphy.com/embed/gGldiUgAUOJ04g1Ves" width="480" height="480" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/europeanathletics-race-mascot-start-gGldiUgAUOJ04g1Ves">via GIPHY</a></p>

+++

# What are some of the models available?

Ollama supports a list of open-source models available on [ollama.ai/library](https://ollama.ai/library 'ollama model library')

Here are some example open-source models that can be downloaded:

| Model              | Parameters | Size  | Download                       |
| ------------------ | ---------- | ----- | ------------------------------ |
| Llama 2            | 7B         | 3.8GB | `ollama run llama2`            |
| Mistral            | 7B         | 4.1GB | `ollama run mistral`           |
| Dolphin Phi        | 2.7B       | 1.6GB | `ollama run dolphin-phi`       |
| Phi-2              | 2.7B       | 1.7GB | `ollama run phi`               |
| Neural Chat        | 7B         | 4.1GB | `ollama run neural-chat`       |
| Starling           | 7B         | 4.1GB | `ollama run starling-lm`       |
| Code Llama         | 7B         | 3.8GB | `ollama run codellama`         |
| Llama 2 Uncensored | 7B         | 3.8GB | `ollama run llama2-uncensored` |
| Llama 2 13B        | 13B        | 7.3GB | `ollama run llama2:13b`        |
| Llama 2 70B        | 70B        | 39GB  | `ollama run llama2:70b`        |
| Orca Mini          | 3B         | 1.9GB | `ollama run orca-mini`         |
| Vicuna             | 7B         | 3.8GB | `ollama run vicuna`            |
| LLaVA              | 7B         | 4.5GB | `ollama run llava`             |

> Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.

+++

# How can I customize the model with prompts?

Create a `Modelfile`:

```
FROM llama2

# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1

# set the system message
SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
"""
```

Next, create and run the model:

```
ollama create mario -f ./Modelfile
ollama run mario
>>> hi
Hello! It's your friend Mario.
```

+++

# Some cool features:

### Multiline input

For multiline input, you can wrap text with `"""`:

```
>>> """Hello,
... world!
... """
I'm a basic program that prints the famous "Hello, world!" message to the console.
```

### Multimodal models

```
>>> What's in this image? /Users/jmorgan/Desktop/smile.png
The image features a yellow smiley face, which is likely the central focus of the picture.
```

### Pass in prompt as arguments

```
$ ollama run llama2 "Summarize this file: $(cat README.md)"
 Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.
```

+++

# How can I interact with the model outside of the terminal?

## REST API

Ollama ships with a REST API for running and managing models. just use `ollama serve`, instead of `ollama run` (but it should be running anyways even with `ollama run`):

### Generate a response

```
curl http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt":"Why is the sky blue?"
}'
```

### Chat with a model

```
curl http://localhost:11434/api/chat -d '{
  "model": "mistral",
  "messages": [
    { "role": "user", "content": "why is the sky blue?" }
  ]
}'
```

+++

# Check out Fireship.io's great video on running Mistral's 8x7B Model

<iframe width="560" height="315" src="https://www.youtube.com/embed/GyllRd2E6fg?si=_tG71hDNLswF1CSX" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>

+++

# Let's play with the model `code llama`

<iframe src="https://giphy.com/embed/vIqU5gwdCPKO4" width="480" height="270" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/llama-vIqU5gwdCPKO4">via GIPHY</a></p>

# What kind of editor integrations are there?

VSCode

[continue](https://github.com/continuedev/continue)
	---
	title: "Setting up a local LLM with Ollama"
	author: "Nafeu Nasir"
	description: "Getting up and running with Ollama and integrating it into your workflow."
	date: 2024-01-24
	---

	# What is a local large language model? (LLM)

	You've heard of LLMs, you've heard of ChatGPT, or Bard or Grok. Well a local LLM is one you can run on your computer. This means that you don't need to rely on a cloud service to use them (better data privacy and security).

	<iframe src="https://giphy.com/embed/qAtZM2gvjWhPjmclZE" width="480" height="296" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/southpark-south-park-deep-learning-s26e4-qAtZM2gvjWhPjmclZE">via GIPHY</a></p>

	+++

	# What is Ollama?

	Ollama allows you to run open-source large language models, such as Llama 2, locally.

	Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile.

	It optimizes setup and configuration details, including GPU usage.

	<div align="center">
	<picture>
	<source media="(prefers-color-scheme: dark)" height="200px" srcset="https://github.com/jmorganca/ollama/assets/3325447/56ea1849-1284-4645-8970-956de6e51c3c">
	<img alt="logo" height="200px" src="https://github.com/jmorganca/ollama/assets/3325447/0d0b44e2-8f4a-4e99-9b52-a5c1c741c8f7">
	</picture>
	</div>

	+++

	# What kind of computer does it require?

	Ollama generally supports machines with 8GB of memory (preferably VRAM). Mac and Linux machines are both supported – although on Linux you'll need an Nvidia GPU right now for GPU acceleration.

	You'll see a noticable difference with GPU acceleration or running on Apple Silicon versus a non-Apple Silicon CPU.

	Model's are optimized to run at 2/3 or 1/2 of total machine ram but the more ram allocation the better performance.

	So if you have `16gb` you'll probably allocate `8gb` for the model, 32gb -> 16gb for the model, 64gb -> 32gb for the model, etc.

	<iframe src="https://giphy.com/embed/P7JmDW7IkB7TW" width="480" height="328" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/one-direction-zayn-malik-abhhhhhhhhhhhhhhhhhhhhhhhh-P7JmDW7IkB7TW">via GIPHY</a></p>

	+++

	# How do you set it up?


	### macOS

	Visit Ollama's website `ollama.ai` and hit download.

	### Windows

	Coming soon! For now, you can install Ollama on Windows via WSL2.

	### Linux & WSL2

	```
	curl https://ollama.ai/install.sh \| sh
	```

	<iframe src="https://giphy.com/embed/7hJZcKzjIufeOmqKSj" width="480" height="356" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/grandma-grandmother-offline-granny-7hJZcKzjIufeOmqKSj">via GIPHY</a></p>

	+++

	# Quickstart

	To run and chat with [Llama 2](https://ollama.ai/library/llama2):

	```
	ollama run llama2
	```

	<iframe src="https://giphy.com/embed/gGldiUgAUOJ04g1Ves" width="480" height="480" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/europeanathletics-race-mascot-start-gGldiUgAUOJ04g1Ves">via GIPHY</a></p>

	+++

	# What are some of the models available?

	Ollama supports a list of open-source models available on [ollama.ai/library](https://ollama.ai/library 'ollama model library')

	Here are some example open-source models that can be downloaded:

	\| Model \| Parameters \| Size \| Download \|
	\| ------------------ \| ---------- \| ----- \| ------------------------------ \|
	\| Llama 2 \| 7B \| 3.8GB \| `ollama run llama2` \|
	\| Mistral \| 7B \| 4.1GB \| `ollama run mistral` \|
	\| Dolphin Phi \| 2.7B \| 1.6GB \| `ollama run dolphin-phi` \|
	\| Phi-2 \| 2.7B \| 1.7GB \| `ollama run phi` \|
	\| Neural Chat \| 7B \| 4.1GB \| `ollama run neural-chat` \|
	\| Starling \| 7B \| 4.1GB \| `ollama run starling-lm` \|
	\| Code Llama \| 7B \| 3.8GB \| `ollama run codellama` \|
	\| Llama 2 Uncensored \| 7B \| 3.8GB \| `ollama run llama2-uncensored` \|
	\| Llama 2 13B \| 13B \| 7.3GB \| `ollama run llama2:13b` \|
	\| Llama 2 70B \| 70B \| 39GB \| `ollama run llama2:70b` \|
	\| Orca Mini \| 3B \| 1.9GB \| `ollama run orca-mini` \|
	\| Vicuna \| 7B \| 3.8GB \| `ollama run vicuna` \|
	\| LLaVA \| 7B \| 4.5GB \| `ollama run llava` \|

	> Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.

	+++

	# How can I customize the model with prompts?

	Create a `Modelfile`:

	```
	FROM llama2

	# set the temperature to 1 [higher is more creative, lower is more coherent]
	PARAMETER temperature 1

	# set the system message
	SYSTEM """
	You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
	"""
	```

	Next, create and run the model:

	```
	ollama create mario -f ./Modelfile
	ollama run mario
	>>> hi
	Hello! It's your friend Mario.
	```

	+++

	# Some cool features:

	### Multiline input

	For multiline input, you can wrap text with `"""`:

	```
	>>> """Hello,
	... world!
	... """
	I'm a basic program that prints the famous "Hello, world!" message to the console.
	```

	### Multimodal models

	```
	>>> What's in this image? /Users/jmorgan/Desktop/smile.png
	The image features a yellow smiley face, which is likely the central focus of the picture.
	```

	### Pass in prompt as arguments

	```
	$ ollama run llama2 "Summarize this file: $(cat README.md)"
	Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.
	```

	+++

	# How can I interact with the model outside of the terminal?

	## REST API

	Ollama ships with a REST API for running and managing models. just use `ollama serve`, instead of `ollama run` (but it should be running anyways even with `ollama run`):

	### Generate a response

	```
	curl http://localhost:11434/api/generate -d '{
	"model": "llama2",
	"prompt":"Why is the sky blue?"
	}'
	```

	### Chat with a model

	```
	curl http://localhost:11434/api/chat -d '{
	"model": "mistral",
	"messages": [
	{ "role": "user", "content": "why is the sky blue?" }
	]
	}'
	```

	+++

	# Check out Fireship.io's great video on running Mistral's 8x7B Model

	<iframe width="560" height="315" src="https://www.youtube.com/embed/GyllRd2E6fg?si=_tG71hDNLswF1CSX" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>

	+++

	# Let's play with the model `code llama`

	<iframe src="https://giphy.com/embed/vIqU5gwdCPKO4" width="480" height="270" frameBorder="0" class="giphy-embed" allowFullScreen></iframe><p><a href="https://giphy.com/gifs/llama-vIqU5gwdCPKO4">via GIPHY</a></p>

	# What kind of editor integrations are there?

	VSCode

	[continue](https://github.com/continuedev/continue)