Peter Whidden PWhiddy

## llama-home.md

      
              1 file
            
          
              34 forks
            
          
              20 comments
            
          
              446 stars
            
          
                rain-1
                / llama-home.md
            
            
              Last active
              June 19, 2024 03:05
            
              
                How to run Llama 13B with a 6GB graphics card
              
          
    This worked on 14/May/23. The instructions will probably require updating in the future.

llama is a text prediction model similar to GPT-2, and the version of GPT-3 that has not been fine tuned yet.
It is also possible to run fine tuned versions (like alpaca or vicuna with this. I think. Those versions are more focused on answering questions)

Note: I have been told that this does not support multiple GPUs. It can only use a single GPU.
It is possible to run LLama 13B with a 6GB graphics card now! (e.g. a RTX 2060). Thanks to the amazing work involved in llama.cpp. The latest change is CUDA/cuBLAS which allows you pick an arbitrary number of the transformer layers to be run on the GPU. This is perfect for low VRAM.

Clone llama.cpp from git, I am on commit 08737ef720f0510c7ec2aa84d7f70c691073c35d.


## consensus.py
"""
Toy demonstration of chain-of-thought and consensus prompting using OpenAI API.

© Riley Goodside 2022
"""

import os
import re
from statistics import mode

## AD.hs
{-# LANGUAGE TypeSynonymInstances #-}
data Dual d = D Float d deriving Show
type Float' = Float

diff :: (Dual Float' -> Dual Float') -> Float -> Float'
diff f x = y'
  where D y y' = f (D x 1)

class VectorSpace v where
  zero  :: v

## killbutmakeitlooklikeanaccident.sh
#!/bin/bash

gdb -p "$1" -batch -ex 'set {short}$rip = 0x050f' -ex 'set $rax=231' -ex 'set $rdi=0' -ex 'cont'

## vectorized-atan2f.cpp
// Copyright (c) 2021 Francesco Mazzoli <f@mazzo.li>
//
// Permission to use, copy, modify, and distribute this software for any
// purpose with or without fee is hereby granted, provided that the above
// copyright notice and this permission notice appear in all copies.
//
// THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
// WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
// MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
// ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES

## add_debug_entitlement.sh
#! /bin/bash
# Simple Utility Script for allowing debug of hardened macOS apps.
# This is useful mostly for plug-in developer that would like keep developing without turning SIP off.
# Credit for idea goes to (McMartin): https://forum.juce.com/t/apple-gatekeeper-notarised-distributables/29952/57?u=ttg
# Update 2022-03-10: Based on Fabian's feedback, add capability to inject DYLD for sanitizers.
#
# Please note:
# - Modern Logic (on M1s) uses `AUHostingService` which resides within the system thus not patchable and REQUIRES to turn-off SIP.
# - Some hosts uses separate plug-in scanning or sandboxing.
#     if that's the case, it's required to patch those (if needed) and attach debugger to them instead.

## macapp.go
// Package main is a sample macOS-app-bundling program to demonstrate how to
// automate the process described in this tutorial:
//
// https://medium.com/@mattholt/packaging-a-go-application-for-macos-f7084b00f6b5
//
// Bundling the .app is the first thing it does, and creating the DMG is the
// second. Making the DMG is optional, and is only done if you provide
// the template DMG file, which you have to create beforehand.
//
// Example use:

## pg-pong.py
""" Trains an agent with (stochastic) Policy Gradients on Pong. Uses OpenAI Gym. """
import numpy as np
import cPickle as pickle
import gym

# hyperparameters
H = 200 # number of hidden layer neurons
batch_size = 10 # every how many episodes to do a param update?
learning_rate = 1e-4
gamma = 0.99 # discount factor for reward

## cuda_check.c
#include <stdio.h>
#include <cuda.h>
#include <cuda_runtime_api.h>

/* Outputs some information on CUDA-enabled devices on your computer,
 * including compute capability and current memory usage.
 *
 * On Linux, compile with: nvcc -o cuda_check cuda_check.c -lcuda
 * On Windows, compile with: nvcc -o cuda_check.exe cuda_check.c -lcuda
 *

## generate_hcn.py
#!/usr/bin/env python3

# This program prints all hcn (highly composite numbers) <= MAXN (=10**18)
#
# The value of MAXN can be changed arbitrarily. When MAXN = 10**100, the
# program needs less than one second to generate the list of hcn.
from math import log

MAXN = 10**18
	"""
	Toy demonstration of chain-of-thought and consensus prompting using OpenAI API.

	© Riley Goodside 2022
	"""

	import os
	import re
	from statistics import mode
	{-# LANGUAGE TypeSynonymInstances #-}
	data Dual d = D Float d deriving Show
	type Float' = Float

	diff :: (Dual Float' -> Dual Float') -> Float -> Float'
	diff f x = y'
	where D y y' = f (D x 1)

	class VectorSpace v where
	zero :: v
	#!/bin/bash

	gdb -p "$1" -batch -ex 'set {short}$rip = 0x050f' -ex 'set $rax=231' -ex 'set $rdi=0' -ex 'cont'
	// Copyright (c) 2021 Francesco Mazzoli <f@mazzo.li>
	//
	// Permission to use, copy, modify, and distribute this software for any
	// purpose with or without fee is hereby granted, provided that the above
	// copyright notice and this permission notice appear in all copies.
	//
	// THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
	// WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
	// MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
	// ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
	#! /bin/bash
	# Simple Utility Script for allowing debug of hardened macOS apps.
	# This is useful mostly for plug-in developer that would like keep developing without turning SIP off.
	# Credit for idea goes to (McMartin): https://forum.juce.com/t/apple-gatekeeper-notarised-distributables/29952/57?u=ttg
	# Update 2022-03-10: Based on Fabian's feedback, add capability to inject DYLD for sanitizers.
	#
	# Please note:
	# - Modern Logic (on M1s) uses `AUHostingService` which resides within the system thus not patchable and REQUIRES to turn-off SIP.
	# - Some hosts uses separate plug-in scanning or sandboxing.
	# if that's the case, it's required to patch those (if needed) and attach debugger to them instead.
	// Package main is a sample macOS-app-bundling program to demonstrate how to
	// automate the process described in this tutorial:
	//
	// https://medium.com/@mattholt/packaging-a-go-application-for-macos-f7084b00f6b5
	//
	// Bundling the .app is the first thing it does, and creating the DMG is the
	// second. Making the DMG is optional, and is only done if you provide
	// the template DMG file, which you have to create beforehand.
	//
	// Example use:
	""" Trains an agent with (stochastic) Policy Gradients on Pong. Uses OpenAI Gym. """
	import numpy as np
	import cPickle as pickle
	import gym

	# hyperparameters
	H = 200 # number of hidden layer neurons
	batch_size = 10 # every how many episodes to do a param update?
	learning_rate = 1e-4
	gamma = 0.99 # discount factor for reward
	#include <stdio.h>
	#include <cuda.h>
	#include <cuda_runtime_api.h>

	/* Outputs some information on CUDA-enabled devices on your computer,
	* including compute capability and current memory usage.
	*
	* On Linux, compile with: nvcc -o cuda_check cuda_check.c -lcuda
	* On Windows, compile with: nvcc -o cuda_check.exe cuda_check.c -lcuda
	*
	#!/usr/bin/env python3

	# This program prints all hcn (highly composite numbers) <= MAXN (=10**18)
	#
	# The value of MAXN can be changed arbitrarily. When MAXN = 10**100, the
	# program needs less than one second to generate the list of hcn.
	from math import log

	MAXN = 10**18