RomeoV/README.md

## README.md

      
    Raw
  

              README.md
            
          
    Brief summary of how to run a llamafile on a GCP instance and connect to it via e.g. gptel.el to have nice access to an LLM.
The server is about $1.20 per hour. We'll present how to shut of the server automatically once no requests have been made for 1h.
Essentially, I'm presenting

A systemd service that starts our llm on port 8081.
A systemd service that monitors port 8081 and pipes requests to a file
A systemd service that monitors the file and shuts down the instance when the file's last modification is >1h.
A small configuration snippet to connect to it from emacs.

Installation


Get a GCP instance with a GPU with 40GB VRAM, i.e. the A100 instances. You need about 70GB or so disk space, but not a lot of RAM or cpu. I recommend getting a spot instance, as it roughly reduces the price by 3x.
The final price comes in at about $1.20 per hour.


Obtain a llamafile, e.g. this one with wget.
Make sure you can run it (i.e. chmod it).


Basically copy over the service files and scripts, reload systemd with sudo systemctl daemon-reload.


Then enable them all with systemctl enable --now launch_llamafile.service etc.


Forward port 8081 via ssh, i.e. ssh -NL 8081:localhost:8081 instance.zone.


Configure your local client, e.g. in emacs


(use-package! gptel
  :config
  (let ((backend (gptel-make-openai                    ;Not a typo, same API as OpenAI
                  "llama-cpp"                          ;Any name
                  :stream t                            ;Stream responses
                  :protocol "http"
                  :host "localhost:8081"               ;Llama.cpp server location, typically localhost:8080 for Llamafile
                  :key nil                             ;No key needed
                  :models '("test"))))                   ;Any names, doesn't matter for Llama

    (setq-default gptel-backend backend
                  gptel-model   "test")))

  
## file_monitor.service
### FILE: /etc/systemd/system/file_monitor.service
[Unit]
Description=Monitor a file and shutdown if it's not modified for more than 1 hour

[Service]
Type=simple
ExecStart=/bin/bash /usr/local/bin/monitor_file.sh "/tmp/requests.log"
Restart=always
RestartSec=60

[Install]
WantedBy=multi-user.target

## launch_llamafile.service
### FILE: /etc/systemd/system/launch_llamafile.service
[Unit]
Description=Dolphin llamafile server.

[Service]
ExecStart=/bin/bash /home/romeo/dolphin-2.5-mixtral-8x7b.Q5_K_M.llamafile -ngl 35
User=romeo
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

## monitor_file.sh
### FILE: /usr/local/bin/monitor_file.sh
#!/bin/bash

FILE=$1

touch $FILE

MODTIME=$(date -r "$FILE" +%s)
NOW=$(date +%s)
DIFF=$((NOW-MODTIME))

if [ $DIFF -gt 3600 ]; then
    echo "Shutting down due to inactivity of ${FILE}"
    shutdown now
fi

## monitor_ports.sh
### FILE: /usr/local/bin/monitor_ports.sh
tcpdump -i lo 'dst port 8081' >> /tmp/requests.log

## port_monitor.service
### FILE: /etc/systemd/system/port_monitor.service
[Unit]
Description=Monitor port 8081.

[Service]
Type=simple
ExecStart=/bin/bash /usr/local/bin/monitor_ports.sh
Restart=on-failure

[Install]
WantedBy=multi-user.target
	### FILE: /etc/systemd/system/file_monitor.service
	[Unit]
	Description=Monitor a file and shutdown if it's not modified for more than 1 hour

	[Service]
	Type=simple
	ExecStart=/bin/bash /usr/local/bin/monitor_file.sh "/tmp/requests.log"
	Restart=always
	RestartSec=60

	[Install]
	WantedBy=multi-user.target
	### FILE: /etc/systemd/system/launch_llamafile.service
	[Unit]
	Description=Dolphin llamafile server.

	[Service]
	ExecStart=/bin/bash /home/romeo/dolphin-2.5-mixtral-8x7b.Q5_K_M.llamafile -ngl 35
	User=romeo
	Restart=always
	RestartSec=10

	[Install]
	WantedBy=multi-user.target
	### FILE: /usr/local/bin/monitor_file.sh
	#!/bin/bash

	FILE=$1

	touch $FILE

	MODTIME=$(date -r "$FILE" +%s)
	NOW=$(date +%s)
	DIFF=$((NOW-MODTIME))

	if [ $DIFF -gt 3600 ]; then
	echo "Shutting down due to inactivity of ${FILE}"
	shutdown now
	fi
	### FILE: /usr/local/bin/monitor_ports.sh
	tcpdump -i lo 'dst port 8081' >> /tmp/requests.log
	### FILE: /etc/systemd/system/port_monitor.service
	[Unit]
	Description=Monitor port 8081.

	[Service]
	Type=simple
	ExecStart=/bin/bash /usr/local/bin/monitor_ports.sh
	Restart=on-failure

	[Install]
	WantedBy=multi-user.target