Keunwoo Choi keunwoochoi

## llama-home.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                keunwoochoi
                / llama-home.md
            
            
              Created
              May 25, 2023 20:11
                — forked from rain-1/llama-home.md
            
              
                How to run Llama 13B with a 6GB graphics card
              
          
    This worked on 14/May/23. The instructions will probably require updating in the future.

llama is a text prediction model similar to GPT-2, and the version of GPT-3 that has not been fine tuned yet.
It is also possible to run fine tuned versions (like alpaca or vicuna with this. I think. Those versions are more focused on answering questions)

Note: I have been told that this does not support multiple GPUs. It can only use a single GPU.
It is possible to run LLama 13B with a 6GB graphics card now! (e.g. a RTX 2060). Thanks to the amazing work involved in llama.cpp. The latest change is CUDA/cuBLAS which allows you pick an arbitrary number of the transformer layers to be run on the GPU. This is perfect for low VRAM.

Clone llama.cpp from git, I am on commit 08737ef720f0510c7ec2aa84d7f70c691073c35d.


## grad_lib.py
"""This is a conversion of https://gist.github.com/yang-song/07392ed7d57a92a87968e774aef96762
to Tensorflow 2 using GradientTape
"""
import tensorflow as tf


@tf.function
def gradients(f, x, tape, grad_ys=None):
    '''
    An easier way of computing gradients in tensorflow. The difference from tf.gradients is

## fancy_youtube_encode.sh
# Based on example here https://trac.ffmpeg.org/wiki/Encode/YouTube
text=$(basename $1 .wav)
ffmpeg -i $1 -filter_complex \
"[0:a]avectorscope=s=640x518,pad=1280:720[vs]; \
[0:a]showspectrum=mode=separate:color=intensity:scale=cbrt:s=640x518[ss]; \
[0:a]showwaves=s=1280x202:mode=line[sw]; \
[vs][ss]overlay=w[bg]; \
[bg][sw]overlay=0:H-h,drawtext=fontfile=/usr/share/fonts/truetype/fonts-japanese-gothic.ttf:fontcolor=white:x=10:y=10:text=$text[out]" \
-map "[out]" -map 0:a -c:v libx264 -preset fast -crf 18 -c:a copy $text.mkv

## pescador_example.py
#!/usr/bin/env python

import numpy as np
import pescador

def data_sampler(filename, n_samples):

    # Load all the data from the file, somehow
    data = load_data(filename)
	"""This is a conversion of https://gist.github.com/yang-song/07392ed7d57a92a87968e774aef96762
	to Tensorflow 2 using GradientTape
	"""
	import tensorflow as tf


	@tf.function
	def gradients(f, x, tape, grad_ys=None):
	'''
	An easier way of computing gradients in tensorflow. The difference from tf.gradients is
	# Based on example here https://trac.ffmpeg.org/wiki/Encode/YouTube
	text=$(basename $1 .wav)
	ffmpeg -i $1 -filter_complex \
	"[0:a]avectorscope=s=640x518,pad=1280:720[vs]; \
	[0:a]showspectrum=mode=separate:color=intensity:scale=cbrt:s=640x518[ss]; \
	[0:a]showwaves=s=1280x202:mode=line[sw]; \
	[vs][ss]overlay=w[bg]; \
	[bg][sw]overlay=0:H-h,drawtext=fontfile=/usr/share/fonts/truetype/fonts-japanese-gothic.ttf:fontcolor=white:x=10:y=10:text=$text[out]" \
	-map "[out]" -map 0:a -c:v libx264 -preset fast -crf 18 -c:a copy $text.mkv
	#!/usr/bin/env python

	import numpy as np
	import pescador

	def data_sampler(filename, n_samples):

	# Load all the data from the file, somehow
	data = load_data(filename)