Shikhar Singh Axe--

## llama-home.md

      
              1 file
            
          
              34 forks
            
          
              20 comments
            
          
              446 stars
            
          
                rain-1
                / llama-home.md
            
            
              Last active
              June 19, 2024 03:05
            
              
                How to run Llama 13B with a 6GB graphics card
              
          
    This worked on 14/May/23. The instructions will probably require updating in the future.

llama is a text prediction model similar to GPT-2, and the version of GPT-3 that has not been fine tuned yet.
It is also possible to run fine tuned versions (like alpaca or vicuna with this. I think. Those versions are more focused on answering questions)

Note: I have been told that this does not support multiple GPUs. It can only use a single GPU.
It is possible to run LLama 13B with a 6GB graphics card now! (e.g. a RTX 2060). Thanks to the amazing work involved in llama.cpp. The latest change is CUDA/cuBLAS which allows you pick an arbitrary number of the transformer layers to be run on the GPU. This is perfect for low VRAM.

Clone llama.cpp from git, I am on commit 08737ef720f0510c7ec2aa84d7f70c691073c35d.


## dispatch_openai_requests.py
# NOTE:
# You can find an updated, more robust and feature-rich implementation
# in Zeno Build
# - Zeno Build: https://github.com/zeno-ml/zeno-build/
# - Implementation: https://github.com/zeno-ml/zeno-build/blob/main/zeno_build/models/providers/openai_utils.py

import openai
import asyncio
from typing import Any

## masked_word_prediction_bert.py
import torch
from transformers import BertTokenizer, BertModel, BertForMaskedLM
import logging
logging.basicConfig(level=logging.INFO)# OPTIONAL


tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForMaskedLM.from_pretrained('bert-base-uncased')
model.eval()

## load_jpeg_with_tensorflow.py
# Typical setup to include TensorFlow.
import tensorflow as tf

# Make a queue of file names including all the JPEG images files in the relative
# image directory.
filename_queue = tf.train.string_input_producer(
    tf.train.match_filenames_once("./images/*.jpg"))

# Read an entire image file which is required since they're JPEGs, if the images
# are too large they could be split in advance to smaller files or use the Fixed
	# NOTE:
	# You can find an updated, more robust and feature-rich implementation
	# in Zeno Build
	# - Zeno Build: https://github.com/zeno-ml/zeno-build/
	# - Implementation: https://github.com/zeno-ml/zeno-build/blob/main/zeno_build/models/providers/openai_utils.py

	import openai
	import asyncio
	from typing import Any
	import torch
	from transformers import BertTokenizer, BertModel, BertForMaskedLM
	import logging
	logging.basicConfig(level=logging.INFO)# OPTIONAL



	tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
	model = BertForMaskedLM.from_pretrained('bert-base-uncased')
	model.eval()
	# Typical setup to include TensorFlow.
	import tensorflow as tf

	# Make a queue of file names including all the JPEG images files in the relative
	# image directory.
	filename_queue = tf.train.string_input_producer(
	tf.train.match_filenames_once("./images/*.jpg"))

	# Read an entire image file which is required since they're JPEGs, if the images
	# are too large they could be split in advance to smaller files or use the Fixed