Luke Meyers metric-space

## fast_peft.py
from bitsandbytes.nn.modules import Linear8bitLt, Linear4bit
from contextlib import contextmanager

def noop (x=None, *args, **kwargs):
    "Do nothing"
    return x

@contextmanager
def no_kaiming():
    old_iku = init.kaiming_uniform_

## LLM.md

      
              2 files
            
          
              161 forks
            
          
              13 comments
            
          
              1614 stars
            
          
                rain-1
                / LLM.md
            
            
              Last active
              July 18, 2024 22:37
            
              
                LLM Introduction: Learn Language Models
              
          
    Purpose

Bootstrap knowledge of LLMs ASAP. With a bias/focus to GPT.
Avoid being a link dump. Try to provide only valuable well tuned information.
Prelude

Neural network links before starting with transformers.

  
## AD.hs
{-# LANGUAGE TypeSynonymInstances #-}
data Dual d = D Float d deriving Show
type Float' = Float

diff :: (Dual Float' -> Dual Float') -> Float -> Float'
diff f x = y'
  where D y y' = f (D x 1)

class VectorSpace v where
  zero  :: v

## CodeGen_GPTJ_Conversion.md

      
              1 file
            
          
              2 forks
            
          
              1 comment
            
          
              56 stars
            
          
                moyix
                / CodeGen_GPTJ_Conversion.md
            
            
              Last active
              January 5, 2024 12:50
            
              
                How to convert the SalesForce CodeGen models to GPT-J
              
          
    Using Linear Algebra to Convert a Large Code Model

Background

The SalesForce CodeGen models are a family of large language models trained on a large amount of natural language data and then fine-tuned on specialized datasets of code. Models of size 350M, 2B, 6B, and 16B parameters are provided in three flavors:

nl, the base model trained on The Pile, a large natural language dataset compiled by EleutherAI
multi, which is fine-tuned from the nl model on a dataset of code in multiple languages, scraped from GitHub, and
mono, which is fine-tuned from the multi model on Python code only.


## What happens when you allocate a JAX tensor on a TPU.md

      
              1 file
            
          
              2 forks
            
          
              0 comments
            
          
              22 stars
            
          
                shawwn
                / What happens when you allocate a JAX tensor on a TPU.md
            
            
              Last active
              April 15, 2023 04:11
            
              
                JAX C++ stack trace walkthrough for TpuExecutor_Allocate
              
          
    Twitter thread: https://twitter.com/theshawwn/status/1456925974919004165

Hacker News thread: https://news.ycombinator.com/item?id=29128998
November 6, 2021
How does JAX allocate memory on a TPU?

jnp.device_put(1) is deceptively simple to write in JAX. But on a TPU, what actually happens? How does a tensor containing the value 1 actually get onto a TPU?
Turns out, the answer is "C++", and a lot of it.

  
## nvidia-gt710-arm-pi-setup.sh
#!/bin/bash

# Attempt to set up the Nvidia GeForce GT 710 on a Pi CM4.
#
# I have tried both armv7l and aarch64 versions of the proprietary driver, in
# addition to the nouveau open source driver (which needs to be compiled into
# a custom Raspberry Pi kernel).
#
# tl;dr - None of the drivers worked :P

## RISC-V.md

      
              1 file
            
          
              11 forks
            
          
              21 comments
            
          
              220 stars
            
          
                erincandescent
                / RISC-V.md
            
            
              Created
              July 25, 2019 23:32
            
          
    Foreward

This document was originally written several years ago. At the time I was working as an execution core verification engineer at Arm. The following points are coloured heavily by working in and around the execution cores of various processors. Apply a pinch of salt; points contain varying degrees of opinion.
It is still my opinion that RISC-V could be much better designed; though I will also say that if I was building a 32 or 64-bit CPU today I'd likely implement the architecture to benefit from the existing tooling.
Mostly based upon the RISC-V ISA spec v2.0. Some updates have been made for v2.2
Original Foreword: Some Opinion

The RISC-V ISA has pursued minimalism to a fault. There is a large emphasis on minimizing instruction count, normalizing encoding, etc. This pursuit of minimalism has resulted in false orthogonalities (such as reusing the same instruction for branches, calls and returns) and a requirement for superfluous instructions which impacts code density both in terms of size and

  
## puretypesystem.hs
{-# LANGUAGE DataKinds         #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE GADTs             #-}
{-# LANGUAGE KindSignatures    #-}
{-# LANGUAGE OverloadedStrings #-}

module Main where

import           Control.Applicative
import           Data.Attoparsec.Text as A

## Makefile
WAYLAND_PROTOCOLS=/usr/share/wayland-protocols

# wayland-scanner is a tool which generates C headers and rigging for Wayland
# protocols, which are specified in XML. wlroots requires you to rig these up
# to your build system yourself and provide them in the include path.
xdg-shell-protocol.h:
	wayland-scanner server-header \
		$(WAYLAND_PROTOCOLS)/stable/xdg-shell/xdg-shell.xml $@

xdg-shell-protocol.c: xdg-shell-protocol.h

## FreeTrees.hs
{-# LANGUAGE BangPatterns #-}

import qualified Data.Vector as V
import           System.CPUTime
import           System.Environment
import           Text.Printf

{- Implementation of the WROM algorithm for finding all
   free trees of a given order. The algorithm is explained
   here:
	from bitsandbytes.nn.modules import Linear8bitLt, Linear4bit
	from contextlib import contextmanager

	def noop (x=None, args, *kwargs):
	"Do nothing"
	return x

	@contextmanager
	def no_kaiming():
	old_iku = init.kaiming_uniform_
	{-# LANGUAGE TypeSynonymInstances #-}
	data Dual d = D Float d deriving Show
	type Float' = Float

	diff :: (Dual Float' -> Dual Float') -> Float -> Float'
	diff f x = y'
	where D y y' = f (D x 1)

	class VectorSpace v where
	zero :: v
	#!/bin/bash

	# Attempt to set up the Nvidia GeForce GT 710 on a Pi CM4.
	#
	# I have tried both armv7l and aarch64 versions of the proprietary driver, in
	# addition to the nouveau open source driver (which needs to be compiled into
	# a custom Raspberry Pi kernel).
	#
	# tl;dr - None of the drivers worked :P
	{-# LANGUAGE DataKinds #-}
	{-# LANGUAGE FlexibleInstances #-}
	{-# LANGUAGE GADTs #-}
	{-# LANGUAGE KindSignatures #-}
	{-# LANGUAGE OverloadedStrings #-}

	module Main where

	import Control.Applicative
	import Data.Attoparsec.Text as A
	WAYLAND_PROTOCOLS=/usr/share/wayland-protocols

	# wayland-scanner is a tool which generates C headers and rigging for Wayland
	# protocols, which are specified in XML. wlroots requires you to rig these up
	# to your build system yourself and provide them in the include path.
	xdg-shell-protocol.h:
	wayland-scanner server-header \
	$(WAYLAND_PROTOCOLS)/stable/xdg-shell/xdg-shell.xml $@

	xdg-shell-protocol.c: xdg-shell-protocol.h
	{-# LANGUAGE BangPatterns #-}

	import qualified Data.Vector as V
	import System.CPUTime
	import System.Environment
	import Text.Printf

	{- Implementation of the WROM algorithm for finding all
	free trees of a given order. The algorithm is explained
	here: