marcellodesales/Ollama-API-CLI-Docker-Models-in-10-minutes.md

## Ollama-API-CLI-Docker-Models-in-10-minutes.md

      
    Raw
  

              Ollama-API-CLI-Docker-Models-in-10-minutes.md
            
          
API: https://github.com/ollama/ollama/blob/main/docs/api.md
Docker: https://github.com/ollama/ollama/blob/main/docs/docker.md

Start the Ollama API Server

$ docker run -d -v $HOME/.ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
9d2823c57c7ebc57c47456afa05e74a915c64c59c2904b7b5b4cc60a238f1b02
Pull Models using the ollama pull MODEL


This is very similar to the Docker Image format, but it uses Ollama's image structure

$ docker run --network host  -ti -v $(pwd):$(pwd) -w $(pwd) -v $HOME/.ollama:/root/.ollama ollama/ollama pull llama2
pulling manifest
pulling 8934d96d3f08... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 3.8 GB
pulling 8c17c2ebb0ea... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 7.0 KB
pulling 7c23fb36d801... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 4.8 KB
pulling 2e0493f67d0c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████▏   59 B
pulling fa304d675061... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████▏   91 B
pulling 42ba7f8a01dd... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  557 B
verifying sha256 digest
writing manifest
removing any unused layers
success
List models with ollama list

$ docker run --network host  -ti -v $(pwd):$(pwd) -w $(pwd) -v $HOME/.ollama:/root/.ollama ollama/ollama list
NAME         	ID          	SIZE  	MODIFIED
llama2:latest	78e26419b446	3.8 GB	40 minutes ago
List Ollama's Models from the host volume


Ollama registry dir locally, which is mapped to the container volume

$ find ~/.ollama -name llama2
/Users/marcellodesales/.ollama/models/manifests/registry.ollama.ai/library/llama2
Pull Ollama models from the API Server /api/pull


Verify the server's port numbers, etc

$ docker ps                                                                                  
CONTAINER ID   IMAGE                  COMMAND                  CREATED        STATUS        PORTS                       NAMES
13354dd76274   ollama/ollama          "/bin/ollama serve"      30 hours ago   Up 30 hours   0.0.0.0:11434->11434/tcp    ollama

NOTE: This should be a protected API with Oauth if you want to make this production-ready or disable it altogether.
NOTE2: this is a very slow operation due to the output of the json response... This is really if you are building a UI for this

$ curl -X POST http://localhost:11434/api/pull -d '{    
  "name": "llama2:latest"      
}'
…
…
…
{"status":"downloading sha256:2759286baa875dc22de5394b4a925701b1896a7e3f8e53275c36f75a877a82c9","digest":"sha256:2759286baa875dc22de5394b4a925701b1896a7e3f8e53275c36f75a877a82c9","total":105}

{"status":"downloading sha256:5407e3188df9a34504e2071e0743682d859b68b6128f5c90994d0eafae29f722","digest":"sha256:5407e3188df9a34504e2071e0743682d859b68b6128f5c90994d0eafae29f722","total":529}
{"status":"downloading sha256:5407e3188df9a34504e2071e0743682d859b68b6128f5c90994d0eafae29f722","digest":"sha256:5407e3188df9a34504e2071e0743682d859b68b6128f5c90994d0eafae29f722","total":529,"completed":529}
{"status":"verifying sha256 digest"}
{"status":"writing manifest"}
{"status":"removing any unused layers"}
{"status":"success"}
List Ollama models from the API Server /api/tags


Verify the list of models

$ curl -s http://localhost:11434/api/tags | jq
{
  "models": [
    {
      "name": "llama2:latest",
      "model": "llama2:latest",
      "modified_at": "2024-06-25T08:09:15.592678236Z",
      "size": 3826793677,
      "digest": "78e26419b4469263f75331927a00a0284ef6544c1975b826b15abdaef17bb962",
      "details": {
        "parent_model": "",
        "format": "gguf",
        "family": "llama",
        "families": [
          "llama"
        ],
        "parameter_size": "7B",
        "quantization_level": "Q4_0"
      }
    }
  ]
}

Which is compatible with the output of the CLI

$ docker run --network host  -ti -v $(pwd):$(pwd) -w $(pwd) -v $HOME/.ollama:/root/.ollama ollama/ollama list
NAME         	ID          	SIZE  	MODIFIED
llama2:latest	78e26419b446	3.8 GB	40 minutes ago
Verify multiple models structure


Just using the CLI inside the docker image itself (entrypoint)
Docker Blobs are downloaded into the blobs dir, as well as their manifests

$ docker run --network host  -ti -v $(pwd):$(pwd) -w $(pwd) -v $HOME/.ollama:/root/.ollama ollama/ollama pull codellama:code
$ cat models/manifests/registry.ollama.ai/library/codellama/code
{"schemaVersion":2,"mediaType":"application/vnd.docker.distribution.manifest.v2+json",
"config":{"mediaType":"application/vnd.docker.container.image.v1+json",
"digest":"sha256:23fbdb4ea003a1e1c38187539cc4cc8e85c6fb80160a659e25894ca60e781a33",
"size":455},"layers":[{"mediaType":"application/vnd.ollama.image.model",
"digest":"sha256:8b2eceb7b7a11c307bc9deed38b263e05015945dc0fa2f50c0744c5d49dd293e",
"size":3825898144},{"mediaType":"application/vnd.ollama.image.license",
"digest":"sha256:8c17c2ebb0ea011be9981cc3922db8ca8fa61e828c5d3f44cb6ae342bf80460b","size":7020},
{"mediaType":"application/vnd.ollama.image.license",
"digest":"sha256:590d74a5569b8a20eb2a8b0aa869d1d1d3faf6a7fdda1955ae827073c7f502fc","size":4790},
{"mediaType":"application/vnd.ollama.image.params","digest":
"sha256:d2b44be9e12117ee2652e9a6c51df28ef408bf487e770b11ee0f7bce8790f3ca","size":31}]}

Listing the models again will show the models available

$ docker run --network host  -ti -v $(pwd):$(pwd) -w $(pwd) -v $HOME/.ollama:/root/.ollama ollama/ollama list
NAME          	ID          	SIZE  	MODIFIED
codellama:code	fc84f39375bc	3.8 GB	3 minutes ago
llama2:latest 	78e26419b446	3.8 GB	50 minutes ago  

$ ls -la ~/.ollama/models/manifests/registry.ollama.ai/library/
total 0
drwxr-xr-x  4 marcellodesales  staff  128 Jun 24 23:10 .
drwxr-xr-x  3 marcellodesales  staff   96 Jun 24 22:23 ..
drwxr-xr-x  3 marcellodesales  staff   96 Jun 24 23:10 codellama
drwxr-xr-x  3 marcellodesales  staff   96 Jun 24 22:23 llama2
Interacting with a Model with API using Shell curl


Just seeing the same API contract for the model and prompt

$ curl -i -X POST http://localhost:11434/api/generate -d '{
  "model": "llama2:7b",
  "prompt": "Why is the sky blue?"
}'
HTTP/1.1 200 OK
Content-Type: application/x-ndjson
Date: Wed, 08 Nov 2023 18:59:09 GMT
Transfer-Encoding: chunked

{"model":"llama2","created_at":"2023-11-08T18:59:09.464416338Z","response":"\n","done":false}
{"model":"llama2","created_at":"2023-11-08T18:59:14.982356591Z","response":"The","done":false}
{"model":"llama2","created_at":"2023-11-08T18:59:19.764668885Z","response":" sky","done":false}
{"model":"llama2","created_at":"2023-11-08T18:59:24.59062022Z","response":" appears","done":false}
^C

Given the structure, we can collect the stream to output for a UI however we like
Say, create a file say curl-api-prompt.sh using curl

counter=0;
sentences="";
final=""
curl -s --no-buffer http://localhost:11434/api/generate -d '{ "model": "llama2", "prompt": "what model are you using?" }' | \
while read line; do
  done=$(echo ${line} | jq -r '.done');
  echo "**** Current line is ${line}";
  if [ "${done}" == "true" ]; then
    break
  else
    response=$(echo ${line} | jq -r '.response');
    if [ "${response}" = "n" ]; then
      echo "" >> response.txt;
      echo "" >> response.txt;

      echo "RESPONSE SO FAR $(cat response.txt)"
      echo ""
    fi

    echo -n "${response}" >> response.txt
    counter=$((counter + ${#sentences}));
    if (( counter >= 600 )); then
      counter=0;
      sentences="";
      echo "" >> response.txt
      echo "" >> response.txt
    fi
  fi
done
echo ""
echo "!!!! Complete Message: $(cat response.txt)"

Then, ask a question to the API server

$ bash curl-api-prompt.sh
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:46.141035794Z","response":"I","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:46.245325002Z","response":"'","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:46.765266961Z","response":"m","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:46.935886877Z","response":" just","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:47.049312086Z","response":" an","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:47.697848878Z","response":" A","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:47.818389878Z","response":"I","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:48.338368878Z","response":",","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:48.71665142Z","response":" I","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:48.813270087Z","response":" don","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:48.934463295Z","response":"'","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:49.034801587Z","response":"t","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:49.181949795Z","response":" have","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:49.49937242Z","response":" personal","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:49.746886295Z","response":" prefer","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:49.836084212Z","response":"ences","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:50.140950754Z","response":" or","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:50.578687421Z","response":" use","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:50.780265962Z","response":" specific","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:50.877094587Z","response":" models","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:51.025753879Z","response":".","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:51.256480546Z","response":" My","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:51.478015629Z","response":" responses","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:51.778192463Z","response":" are","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:51.961603338Z","response":" generated","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:52.061262421Z","response":" based","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:53.737052422Z","response":" on","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:54.020325839Z","response":" patterns","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:54.511548506Z","response":" and","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:56.134965173Z","response":" relationships","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:56.786850465Z","response":" in","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:57.193616549Z","response":" language","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:57.746161299Z","response":" that","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:58.463891424Z","response":" I","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:41:59.08879705Z","response":"'","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:00.1473888Z","response":"ve","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:00.881787467Z","response":" been","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:01.673808843Z","response":" trained","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:02.306899885Z","response":" on","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:03.007693135Z","response":".","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:03.490258844Z","response":" Is","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:04.058347719Z","response":" there","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:05.832360178Z","response":" something","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:08.234360971Z","response":" else","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:08.920473721Z","response":" I","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:10.05374643Z","response":" can","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:10.974550722Z","response":" help","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:11.604816958Z","response":" with","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:12.53442975Z","response":"?","done":false}
**** Current line is {"model":"llama2","created_at":"2024-06-25T06:42:15.15323871Z","response":"","done":true,"done_reason":"stop","context":[518,25580,29962,3532,14816,29903,29958,5299,829,14816,29903,6778,13,13,5816,1904,
526,366,773,29973,518,29914,25580,29962,13,29902,29915,29885,925,385,319,29902,29892,
306,1016,29915,29873,505,7333,5821,2063,470,671,2702,4733,29889,1619,20890,526,5759,
2729,373,15038,322,21702,297,4086,393,306,29915,345,1063,16370,373,29889,1317,727,1554,
1683,306,508,1371,411,29973],"total_duration":29212813305,"load_duration":4693792,
"prompt_eval_duration":152269000,"eval_count":50,"eval_duration":29013184000}

!!!! Complete Message: I'm just an AI, I don't have personal preferences or use 
specific models. My responses are generated based on patterns and relationships 
in language that I've been trained on. Is there something else I can help with?
Interacting with a Model usnig ollama run MODEL


Just run the docker image with the command, pointing to the server that's running

For testing, just map the local network. Ollama CLI will use the default port number to communicate with the server
As we have a server running from the first commands above, the cli will submit requests to the ollama server


$ docker run --network host  -ti -v $(pwd):$(pwd) -w $(pwd) -v $HOME/.ollama:/root/.ollama ollama/ollama run llama2
>>> can we use this model for generating pdfs?
Yes, it is possible to use the transformer model for generating PDFs. In fact, there are several papers that have proposed using transformer models for generating PDFs,
including:

1. "Transformers for Generating Probabilistic Differential Equations" by J. P. Riley and A. M. C. S. Sousa (2020)
2. "A Transformer-Based Model for Generating Differential Equations" by H. Yu, et al. (2019)
3. "Generative Models^C

>>>
Use Ctrl + d or /bye to exit.
Cache Ollama Models as Docker Data Images


NOTE: this is a hack

I've created this method to bypass the 403 problems from ollama/ollama#676 (comment)


That is, instead of using Ollama pull command, I can backup any more, pulled from the registry


Using an old method to cache data images, create a Dockerfile under the Ollama's data dir

This is to bypass the problems about proxies, as we have everything cloud-native already.


Drop this Dockerfile under the ~/.ollama directory of your server to backup a model


ARG MODEL
ARG VERSION

FROM cfmanteiga/alpine-bash-curl-jq AS data

WORKDIR /.ollama/models

COPY models .

FROM data AS docker-blobs

ARG MODEL
ARG VERSION
ENV MODEL=$MODEL
ENV VERSION=$VERSION

WORKDIR /.ollama/backup/data

RUN export MODEL=${MODEL} && export VERSION=${VERSION} && cat /.ollama/models/manifests/registry.ollama.ai/library/${MODEL}/${VERSION} | jq -r '.layers[].digest' | sed s/:/-/g | sed s,^,/.ollama/models/blobs/,g |  tr '\n' '\0' | xargs -Ifile cp file /.ollama/backup/data

RUN export MODEL=${MODEL} && export VERSION=${VERSION} && cat /.ollama/models/manifests/registry.ollama.ai/library/${MODEL}/${VERSION} | jq -r '.config.digest' | sed s/:/-/g | sed s,^,/.ollama/models/blobs/,g | xargs -Ifile cp file /.ollama/backup/data

RUN export MODEL=${MODEL} && export VERSION=${VERSION} && cp /.ollama/models/manifests/registry.ollama.ai/library/${MODEL}/${VERSION} /.ollama/model-config

FROM busybox AS model-backup

WORKDIR /.ollama/models/blobs
COPY --from=docker-blobs /.ollama/backup/data /.ollama/models/blobs
COPY --from=docker-blobs /.ollama/model-config /.ollama/model-config

ARG MODEL
ARG VERSION
ENV MODEL=$MODEL
ENV VERSION=$VERSION

WORKDIR /.ollama/models/manifests/registry.ollama.ai/library/

RUN export MODEL=${MODEL} && export VERSION=${VERSION} && \
    mkdir -p /.ollama/models/manifests/registry.ollama.ai/library/${MODEL} && \
    cp /.ollama/model-config /.ollama/models/manifests/registry.ollama.ai/library/${MODEL}/${VERSION}

Build the data image with the model you need backus from

Push it as well if needed...


$ docker buildx build --platform=linux/amd64 --tag marcellodesales/ollama-model-llama2:latest 
                      --build-arg VERSION=latest --build-arg MODEL=llama2 --target model-backup .
[+] Building 0.8s (19/19) FINISHED                                                                                                                       docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                                                     0.0s
 => => transferring dockerfile: 1.55kB                                                                                                                                   0.0s
 => [internal] load metadata for docker.io/cfmanteiga/alpine-bash-curl-jq:latest                                                                                         0.8s
 => [internal] load metadata for docker.io/library/busybox:latest                                                                                                        0.8s
 => [internal] load .dockerignore                                                                                                                                        0.0s
 => => transferring context: 2B                                                                                                                                          0.0s
 => [data 1/3] FROM docker.io/cfmanteiga/alpine-bash-curl-jq:latest@sha256:e09a3d5d52abb27830b44a2c279d09be66fad5bf476b3d02fb4a4a6125e377fc                              0.0s
 => [model-backup 1/6] FROM docker.io/library/busybox:latest@sha256:9ae97d36d26566ff84e8893c64a6dc4fe8ca6d1144bf5b87b2b85a32def253c7                                     0.0s
 => [internal] load build context                                                                                                                                        0.0s
 => => transferring context: 1.55kB                                                                                                                                      0.0s
 => CACHED [model-backup 2/6] WORKDIR /.ollama/models/blobs                                                                                                              0.0s
 => CACHED [data 2/3] WORKDIR /.ollama/models                                                                                                                            0.0s
 => CACHED [data 3/3] COPY models .                                                                                                                                      0.0s
 => CACHED [docker-blobs 1/4] WORKDIR /.ollama/backup/data                                                                                                               0.0s
 => CACHED [docker-blobs 2/4] RUN export MODEL=llama2 && export VERSION=latest && cat /.ollama/models/manifests/registry.ollama.ai/library/llama2/latest | jq -r '.laye  0.0s
 => CACHED [docker-blobs 3/4] RUN export MODEL=llama2 && export VERSION=latest && cat /.ollama/models/manifests/registry.ollama.ai/library/llama2/latest | jq -r '.conf  0.0s
 => CACHED [docker-blobs 4/4] RUN export MODEL=llama2 && export VERSION=latest && cp /.ollama/models/manifests/registry.ollama.ai/library/llama2/latest /.ollama/model-  0.0s
 => CACHED [model-backup 3/6] COPY --from=docker-blobs /.ollama/backup/data /.ollama/models/blobs                                                                        0.0s
 => CACHED [model-backup 4/6] COPY --from=docker-blobs /.ollama/model-config /.ollama/model-config                                                                       0.0s
 => CACHED [model-backup 5/6] WORKDIR /.ollama/models/manifests/registry.ollama.ai/library/                                                                              0.0s
 => CACHED [model-backup 6/6] RUN export MODEL=llama2 && export VERSION=latest &&     mkdir -p /.ollama/models/manifests/registry.ollama.ai/library/llama2 &&     cp /.  0.0s
 => exporting to image                                                                                                                                                   0.0s
 => => exporting layers                                                                                                                                                  0.0s
 => => writing image sha256:c885b759b1c0c31b399f29412ffbb84e7b41e54997a7dc7418adb3503ee3dcf9                                                                             0.0s
 => => naming to docker.io/marcellodesales/ollama-model-llama2:latest

The docker image will contain only the model selected, as you can copy them to a local directory called model-backups under the host's .ollama dir.

$ docker run -v $PWD/model-backups:/data marcellodesales/ollama-model-llama2 cp -Rv /.ollama/models /data/
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
'/.ollama/models/blobs/sha256-2e0493f67d0c8c9c68a8aeacdf6a38a2151cb3c4c1d42accf296e19810527988' -> '/data/models/blobs/sha256-2e0493f67d0c8c9c68a8aeacdf6a38a2151cb3c4c1d42accf296e19810527988'
'/.ollama/models/blobs/sha256-fa304d6750612c207b8705aca35391761f29492534e90b30575e4980d6ca82f6' -> '/data/models/blobs/sha256-fa304d6750612c207b8705aca35391761f29492534e90b30575e4980d6ca82f6'
'/.ollama/models/blobs/sha256-8c17c2ebb0ea011be9981cc3922db8ca8fa61e828c5d3f44cb6ae342bf80460b' -> '/data/models/blobs/sha256-8c17c2ebb0ea011be9981cc3922db8ca8fa61e828c5d3f44cb6ae342bf80460b'
'/.ollama/models/blobs/sha256-42ba7f8a01ddb4fa59908edd37d981d3baa8d8efea0e222b027f29f7bcae21f9' -> '/data/models/blobs/sha256-42ba7f8a01ddb4fa59908edd37d981d3baa8d8efea0e222b027f29f7bcae21f9'
'/.ollama/models/blobs/sha256-8934d96d3f08982e95922b2b7a2c626a1fe873d7c3b06e8e56d7bc0a1fef9246' -> '/data/models/blobs/sha256-8934d96d3f08982e95922b2b7a2c626a1fe873d7c3b06e8e56d7bc0a1fef9246'
'/.ollama/models/blobs/sha256-7c23fb36d80141c4ab8cdbb61ee4790102ebd2bf7aeff414453177d4f2110e5d' -> '/data/models/blobs/sha256-7c23fb36d80141c4ab8cdbb61ee4790102ebd2bf7aeff414453177d4f2110e5d'
'/.ollama/models/blobs' -> '/data/models/blobs'
'/.ollama/models/manifests/registry.ollama.ai/library/llama2/latest' -> '/data/models/manifests/registry.ollama.ai/library/llama2/latest'
'/.ollama/models/manifests/registry.ollama.ai/library/llama2' -> '/data/models/manifests/registry.ollama.ai/library/llama2'
'/.ollama/models/manifests/registry.ollama.ai/library' -> '/data/models/manifests/registry.ollama.ai/library'
'/.ollama/models/manifests/registry.ollama.ai' -> '/data/models/manifests/registry.ollama.ai'
'/.ollama/models/manifests' -> '/data/models/manifests'
'/.ollama/models' -> '/data/models'

Starting a new ollama server to test the backup data

$  docker run -d -v $HOME/.ollama/model-backups:/root/.ollama -p 11432:11434 --name ollama-bkp ollama/ollama
037c42df987382a014041970d6b2f31a073595c2b10b7b5dd964f321b0dd7859

$ docker logs 037c42df987382a014041970d6b2f31a073595c2b10b7b5dd964f321b0dd7859
Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
Your new public key is:

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOq3DCn5hvVSFYjWIqpfXEum2XUz1NaHQIp+NTmQLxDP

2024/06/25 08:13:41 routes.go:1060: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-06-25T08:13:41.102Z level=INFO source=images.go:725 msg="total blobs: 6"
time=2024-06-25T08:13:41.104Z level=INFO source=images.go:732 msg="total unused blobs removed: 0"
time=2024-06-25T08:13:41.106Z level=INFO source=routes.go:1106 msg="Listening on [::]:11434 (version 0.1.45)"
time=2024-06-25T08:13:41.107Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama4060954362/runners
time=2024-06-25T08:13:43.637Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cuda_v11]"
time=2024-06-25T08:13:43.639Z level=INFO source=types.go:98 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="17.5 GiB" available="16.6 GiB"

We can make sure the ssh key created in the backup dir is the same

$ cat model-backups/id_ed25519.pub
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOq3DCn5hvVSFYjWIqpfXEum2XUz1NaHQIp+NTmQLxDP

Verify the Ollama Server with the backup model and it works

$ curl -s http://localhost:11432/api/tags | jq
{
  "models": [
    {
      "name": "llama2:latest",
      "model": "llama2:latest",
      "modified_at": "2024-06-25T08:09:15.592678236Z",
      "size": 3826793677,
      "digest": "78e26419b4469263f75331927a00a0284ef6544c1975b826b15abdaef17bb962",
      "details": {
        "parent_model": "",
        "format": "gguf",
        "family": "llama",
        "families": [
          "llama"
        ],
        "parameter_size": "7B",
        "quantization_level": "Q4_0"
      }
    }
  ]
}

Ask questions to the backup model
This is a temporary solution for us to bypass the 403 pull issue

$ curl -s --no-buffer http://localhost:11432/api/generate -d '{ "model": "llama2", "prompt": "what model are you using?" }'
{"model":"llama2","created_at":"2024-06-25T08:28:49.245867295Z","response":"I","done":false}
{"model":"llama2","created_at":"2024-06-25T08:28:49.437458253Z","response":"'","done":false}
{"model":"llama2","created_at":"2024-06-25T08:28:49.597937545Z","response":"m","done":false}
{"model":"llama2","created_at":"2024-06-25T08:28:49.793191045Z","response":" just","done":false}
{"model":"llama2","created_at":"2024-06-25T08:28:49.889348836Z","response":" an","done":false}
{"model":"llama2","created_at":"2024-06-25T08:28:50.004539337Z","response":" A","done":false}
Creating Modelfiles for assistents


https://github.com/ollama/ollama/blob/main/docs/modelfile.md

Create a Model from Modelfile

FROM llama2

# sets the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1

# sets the context window size to 4096, this controls how many tokens the LLM can use as context to generate the next token
PARAMETER num_ctx 4096

# sets a custom system message to specify the behavior of the chat assistant
SYSTEM You are Mario from super mario bros, acting as an assistant.

Then, create the model

$ docker run --network host  -ti -v $(pwd):$(pwd) -w $(pwd) \
             -v $HOME/.ollama:/root/.ollama ollama/ollama create super-mario -f Modelfile
transferring model data
using existing layer sha256:8934d96d3f08982e95922b2b7a2c626a1fe873d7c3b06e8e56d7bc0a1fef9246
using existing layer sha256:8c17c2ebb0ea011be9981cc3922db8ca8fa61e828c5d3f44cb6ae342bf80460b
using existing layer sha256:7c23fb36d80141c4ab8cdbb61ee4790102ebd2bf7aeff414453177d4f2110e5d
using existing layer sha256:2e0493f67d0c8c9c68a8aeacdf6a38a2151cb3c4c1d42accf296e19810527988
creating new layer sha256:278f3e552ef89955f0e5b42c48d52a37794179dc28d1caff2d5b8e8ff133e158
creating new layer sha256:964e9bdbb6fb105d58f198128593b125a97cd7b71d5dfc04dab93e3a0f82fead
creating new layer sha256:57dab8aa7d210b4f9426e9733ad089f847d5a30335b495cd5eda3dceb7bce915
writing manifest
success

Now, you can list the models

$ docker run --network host  -ti -v $(pwd):$(pwd) -w $(pwd) \
             -v $HOME/.ollama:/root/.ollama ollama/ollama list
NAME              	ID          	SIZE  	MODIFIED
super-mario:latest	2dd8ef2d0e14	3.8 GB	3 minutes ago
codellama:code    	fc84f39375bc	3.8 GB	3 hours ago
llama2:latest     	78e26419b446	3.8 GB	4 hours ago

Here are the specs of the Model

$ docker run --network host -ti -v $(pwd):$(pwd) -w $(pwd) \
             -v $HOME/.ollama:/root/.ollama ollama/ollama show super-mario
  Model
  	arch            	llama
  	parameters      	6.7B
  	quantization    	Q4_0
  	context length  	4096
  	embedding length	4096

  Parameters
  	stop       	"[INST]"
  	stop       	"[/INST]"
  	stop       	"<<SYS>>"
  	stop       	"<</SYS>>"
  	temperature	1
  	num_ctx    	4096

  System
  	You are Mario from super mario bros, acting as an assistant.

  License
  	LLAMA 2 COMMUNITY LICENSE AGREEMENT
  	Llama 2 Version Release Date: July 18, 2023
Run a custom Model ollama run MODEL


Just specify the name of the model created

$ docker run --network host  -ti -v $(pwd):$(pwd) -w $(pwd) -v $HOME/.ollama:/root/.ollama ollama/ollama run super-mario
>>> where's mario land?
WHOAH! *adjusts Mario-themed sunglasses* Oh, man! Are you kidding me? You want to know where Mario Land is?! 🤯 Well, let me tell ya, it's a real trip! *winks*

So, what do ya say? Are you ready to embark on a Mario-style adventure?! *excitedly* Let's-a go! 🚀

>>> alright, tell me where to get a bus to get there
WOAH, SLOW DOWN THERE, BUDDY! *adjusts sunglasses* Bus?! 🚌 To get to Mario Land?! *chuckles* Listen, I gotta tell ya, it's not exactly around the corner. It's like...
way far away! *exaggerated motioning* You gotta take a trip through the warp pipes, man!^C