Skip to content

Instantly share code, notes, and snippets.

@AmgadHasan
Last active May 2, 2024 20:27
Show Gist options
  • Save AmgadHasan/2d80064928face09d7a8ffadaca6caf1 to your computer and use it in GitHub Desktop.
Save AmgadHasan/2d80064928face09d7a8ffadaca6caf1 to your computer and use it in GitHub Desktop.
Simplest Ways to Chat with LLMs Locally

Motivation

Many people want to get starting with using chatbots locally. However, they often are intimidated by the complexity and the steep learning curve needed to even run a basic chatbot.

In this gist, I share a few simple methods to run a chatbot locally without the need to do 69 installation steps. All you need a machine with enough memory to run the model. A computer with 8GB of cpu ram is the minimum requirement.

These are currently the recommneded methods.

Use this method if you're:

  1. Already familiar with docker (a simple basic knowledge is all that's needed, nothing too complicated)
  2. Want to learn docker

Use this if you want the simplest way possible

Docker Containers

This is a super simple guide to run a chatbot locally using docker containers.

Pre-requisites

All you need is:

  1. Docker
  2. A model

Docker

To install docker on ubuntu, simply run:

sudo apt install docker.io

Model

You can select any model you want as long as it's a gguf. I recommend openchat-3.5-1210.Q4_K_M to get started: It requires 6GB of memery (can work without gpu too)

All you need to do is to:

  1. Create a models folder somewhere
  2. Download a model (like the above)
  3. Put the downloaded model inside the models folder

Running

1. Downlaod the docker image:

sudo docker pull ghcr.io/ggerganov/llama.cpp:full

2. Run the server

sudo docker run -p 8181:8181 --network bridge -v path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --server -m /models/7B/openchat-3.5-1210.Q4_K_M.gguf -c 2048 -ngl 43 -mg 1 --port 8181 --host 0.0.0.0

3. Start chatting

Now open a browser and go to http://0.0.0.0:8181/ and start chatting with the model!

Llama Files

This is a super super simple guide to run a chatbot locally using llamafile.

Pre-requisites

All you need is:

  1. Llamafile server
  2. A model

Llamafile Server

Go to https://github.com/Mozilla-Ocho/llamafile/releases/ and download a llamafile-x.y.z where x.y.z is the version. For example, here's how to download: llamafile-0.8.1

curl -LO https://github.com/Mozilla-Ocho/llamafile/releases/download/0.8.1/llamafile-0.8.1

Now, make it executable as follows:

chmod 755 ./llamafile-0.8.1

Model

You can select any model you want as long as it's a gguf. I recommend Meta-Llama-3-8B-Instruct-Q5_K_M to get started: It requires 6GB of memery (can work without gpu too)

All you need to do is to:

  1. Create a models folder somewhere
  2. Download a model (like the above)
  3. Put the downloaded model inside the models folder

Running

1. Run the server with the specified model

./llamafile-0.8.1 -m models/Meta-Llama-3-8B-Instruct-Q5_K_M.gguf 

2. Start chatting

Now open a browser and go to http://127.0.0.1:8080/ and start chatting with the model!

3. Using the API

Llamafile uses llama.cpp server under the hood which provides an OpenAI compatible API. You can make requestes to the API as follows:

curl http://127.0.0.1:8080/v1/chat/completions   -H "Content-Type: application/json"   -H "Authorization: Bearer $OPENAI_API_KEY"   -d '{ 
  "model": "/models/Meta-Llama-3-8B-Instruct-Q5_K_M.gguf",
  "messages": [
    {
      "role": "system",
      "content": "You will be provided with statements, and your task is to convert them to standard English."
    },
    {
      "role": "user",
      "content": "She not went to the market."
    }
  ],
  "temperature": 2.0,
  "max_tokens": 64,
  "top_p": 1
}'

This will return the following response:

{"choices":[{"finish_reason":"stop","index":0,"message":{"content":"She didn't go to the market. ","role":"assistant"}}],"created":1703860191,"id":"chatcmpl-blpd2RutXMbnqdbCnJkbvTR5cLlo9hvz","model":"/models/Meta-Llama-3-8B-Instruct-Q5_K_M.gguf","object":"chat.completion","usage":{"completion_tokens":15,"prompt_tokens":68,"total_tokens":83}}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment