AmgadHasan/1-overview.md

## 1-overview.md

      
    Raw
  

              1-overview.md
            
          
    Motivation

Many people want to get starting with using chatbots locally.
However, they often are intimidated by the complexity and the steep learning curve needed to even run a basic chatbot.
In this gist, I share a few simple methods to run a chatbot locally without the need to do 69 installation steps.
All you need a machine with enough memory to run the model.
A computer with 8GB of cpu ram is the minimum requirement.
These are currently the recommneded methods.
1. Docker Containers

Use this method if you're:

Already familiar with docker (a simple basic knowledge is all that's needed, nothing too complicated)
Want to learn docker

2. LLamafile

Use this if you want the simplest way possible

  
## 2-docker-local-chatbot.md

      
    Raw
  

              2-docker-local-chatbot.md
            
          
    Docker Containers

This is a super simple guide to run a chatbot locally using docker containers.
Pre-requisites

All you need is:

Docker
A model

Docker

To install docker on ubuntu, simply run:
sudo apt install docker.io
Model

You can select any model you want as long as it's a gguf. I recommend openchat-3.5-1210.Q4_K_M to get started: It requires 6GB of memery (can work without gpu too)
All you need to do is to:

Create a models folder somewhere
Download a model (like the above)
Put the downloaded model inside the models folder

Running

1. Downlaod the docker image:

sudo docker pull ghcr.io/ggerganov/llama.cpp:full
2. Run the server

sudo docker run -p 8181:8181 --network bridge -v path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --server -m /models/7B/openchat-3.5-1210.Q4_K_M.gguf -c 2048 -ngl 43 -mg 1 --port 8181 --host 0.0.0.0
3. Start chatting

Now open a browser and go to http://0.0.0.0:8181/ and start chatting with the model!

  
## 3-llamfile-local-chatbot.md

      
    Raw
  

              3-llamfile-local-chatbot.md
            
          
    Llama Files

This is a super super simple guide to run a chatbot locally using llamafile.
Pre-requisites

All you need is:

Llamafile server
A model

Llamafile Server

Go to https://github.com/Mozilla-Ocho/llamafile/releases/ and download a llamafile-x.y.z where x.y.z is the version.
For example, here's how to download: llamafile-0.8.1
curl -LO https://github.com/Mozilla-Ocho/llamafile/releases/download/0.8.1/llamafile-0.8.1
Now, make it executable as follows:
chmod 755 ./llamafile-0.8.1
Model

You can select any model you want as long as it's a gguf. I recommend Meta-Llama-3-8B-Instruct-Q5_K_M to get started: It requires 6GB of memery (can work without gpu too)
All you need to do is to:

Create a models folder somewhere
Download a model (like the above)
Put the downloaded model inside the models folder

Running

1. Run the server with the specified model

./llamafile-0.8.1 -m models/Meta-Llama-3-8B-Instruct-Q5_K_M.gguf 
2. Start chatting

Now open a browser and go to http://127.0.0.1:8080/ and start chatting with the model!
3. Using the API

Llamafile uses llama.cpp server under the hood which provides an OpenAI compatible API. You can make requestes to the API as follows:
curl http://127.0.0.1:8080/v1/chat/completions   -H "Content-Type: application/json"   -H "Authorization: Bearer $OPENAI_API_KEY"   -d '{ 
  "model": "/models/Meta-Llama-3-8B-Instruct-Q5_K_M.gguf",
  "messages": [
    {
      "role": "system",
      "content": "You will be provided with statements, and your task is to convert them to standard English."
    },
    {
      "role": "user",
      "content": "She not went to the market."
    }
  ],
  "temperature": 2.0,
  "max_tokens": 64,
  "top_p": 1
}'

This will return the following response:
{"choices":[{"finish_reason":"stop","index":0,"message":{"content":"She didn't go to the market. ","role":"assistant"}}],"created":1703860191,"id":"chatcmpl-blpd2RutXMbnqdbCnJkbvTR5cLlo9hvz","model":"/models/Meta-Llama-3-8B-Instruct-Q5_K_M.gguf","object":"chat.completion","usage":{"completion_tokens":15,"prompt_tokens":68,"total_tokens":83}}