Brief summary of how to run a llamafile on a GCP instance and connect to it via e.g. gptel.el to have nice access to an LLM. The server is about $1.20 per hour. We'll present how to shut of the server automatically once no requests have been made for 1h.
Essentially, I'm presenting
- A systemd service that starts our llm on port 8081.
- A systemd service that monitors port 8081 and pipes requests to a file
- A systemd service that monitors the file and shuts down the instance when the file's last modification is >1h.
- A small configuration snippet to connect to it from emacs.