Skip to content

Instantly share code, notes, and snippets.

View romanvm's full-sized avatar

Roman Miroshnychenko romanvm

View GitHub Profile
@rain-1
rain-1 / llama-home.md
Last active April 28, 2024 18:42
How to run Llama 13B with a 6GB graphics card

This worked on 14/May/23. The instructions will probably require updating in the future.

llama is a text prediction model similar to GPT-2, and the version of GPT-3 that has not been fine tuned yet. It is also possible to run fine tuned versions (like alpaca or vicuna with this. I think. Those versions are more focused on answering questions)

Note: I have been told that this does not support multiple GPUs. It can only use a single GPU.

It is possible to run LLama 13B with a 6GB graphics card now! (e.g. a RTX 2060). Thanks to the amazing work involved in llama.cpp. The latest change is CUDA/cuBLAS which allows you pick an arbitrary number of the transformer layers to be run on the GPU. This is perfect for low VRAM.

  • Clone llama.cpp from git, I am on commit 08737ef720f0510c7ec2aa84d7f70c691073c35d.
@artur-kink
artur-kink / Controller.hpp
Created August 19, 2014 03:17
C++ Webserver using libmicrohttpd. A simple wrapper for libmicrohttpd to create websites using C++.
#ifndef _CONTROLLER_
#define _CONTROLLER_
#include <sstream>
/**
* Base controller for handling http requests.
*/
class Controller{
@kgaughan
kgaughan / threadpoolss.py
Created June 10, 2013 16:13
All the thread pool mixins for SocketServer are, well, not that good, so I knocked together my own. This one can cleanly shut down the pool
"""
Thread pool extensions to SocketServer.
"""
import Queue
import SocketServer
import sys
import threading