I hereby claim:
- I am maxl on github.
 - I am defuzed (https://keybase.io/defuzed) on keybase.
 - I have a public key ASC7lW6pw740h058Mo2q11zj53WIfI0fInWtdhH18QU2Xgo
 
To claim this, I am signing this object:
I hereby claim:
To claim this, I am signing this object:
This worked on 14/May/23. The instructions will probably require updating in the future.
llama is a text prediction model similar to GPT-2, and the version of GPT-3 that has not been fine tuned yet. It is also possible to run fine tuned versions (like alpaca or vicuna with this. I think. Those versions are more focused on answering questions)
Note: I have been told that this does not support multiple GPUs. It can only use a single GPU.
It is possible to run LLama 13B with a 6GB graphics card now! (e.g. a RTX 2060). Thanks to the amazing work involved in llama.cpp. The latest change is CUDA/cuBLAS which allows you pick an arbitrary number of the transformer layers to be run on the GPU. This is perfect for low VRAM.
08737ef720f0510c7ec2aa84d7f70c691073c35d.