Skip to content

Instantly share code, notes, and snippets.

View metric-space's full-sized avatar

Luke Meyers metric-space

View GitHub Profile
@jph00
jph00 / fast_peft.py
Last active October 18, 2023 21:08
Make get_peft_model() fast
from bitsandbytes.nn.modules import Linear8bitLt, Linear4bit
from contextlib import contextmanager
def noop (x=None, *args, **kwargs):
"Do nothing"
return x
@contextmanager
def no_kaiming():
old_iku = init.kaiming_uniform_
@rain-1
rain-1 / LLM.md
Last active July 18, 2024 22:37
LLM Introduction: Learn Language Models

Purpose

Bootstrap knowledge of LLMs ASAP. With a bias/focus to GPT.

Avoid being a link dump. Try to provide only valuable well tuned information.

Prelude

Neural network links before starting with transformers.

@ttesmer
ttesmer / AD.hs
Last active July 22, 2024 05:21
Automatic Differentiation in 38 lines of Haskell using Operator Overloading and Dual Numbers. Inspired by conal.net/papers/beautiful-differentiation
{-# LANGUAGE TypeSynonymInstances #-}
data Dual d = D Float d deriving Show
type Float' = Float
diff :: (Dual Float' -> Dual Float') -> Float -> Float'
diff f x = y'
where D y y' = f (D x 1)
class VectorSpace v where
zero :: v
@moyix
moyix / CodeGen_GPTJ_Conversion.md
Last active January 5, 2024 12:50
How to convert the SalesForce CodeGen models to GPT-J

Using Linear Algebra to Convert a Large Code Model

Background

The SalesForce CodeGen models are a family of large language models trained on a large amount of natural language data and then fine-tuned on specialized datasets of code. Models of size 350M, 2B, 6B, and 16B parameters are provided in three flavors:

  • nl, the base model trained on The Pile, a large natural language dataset compiled by EleutherAI
  • multi, which is fine-tuned from the nl model on a dataset of code in multiple languages, scraped from GitHub, and
  • mono, which is fine-tuned from the multi model on Python code only.
@shawwn
shawwn / What happens when you allocate a JAX tensor on a TPU.md
Last active April 15, 2023 04:11
JAX C++ stack trace walkthrough for TpuExecutor_Allocate
@geerlingguy
geerlingguy / nvidia-gt710-arm-pi-setup.sh
Last active April 14, 2024 16:26
Set up the Nvidia GeForce GT 710 on Raspberry Pi Compute Module 4
#!/bin/bash
# Attempt to set up the Nvidia GeForce GT 710 on a Pi CM4.
#
# I have tried both armv7l and aarch64 versions of the proprietary driver, in
# addition to the nouveau open source driver (which needs to be compiled into
# a custom Raspberry Pi kernel).
#
# tl;dr - None of the drivers worked :P

Foreward

This document was originally written several years ago. At the time I was working as an execution core verification engineer at Arm. The following points are coloured heavily by working in and around the execution cores of various processors. Apply a pinch of salt; points contain varying degrees of opinion.

It is still my opinion that RISC-V could be much better designed; though I will also say that if I was building a 32 or 64-bit CPU today I'd likely implement the architecture to benefit from the existing tooling.

Mostly based upon the RISC-V ISA spec v2.0. Some updates have been made for v2.2

Original Foreword: Some Opinion

The RISC-V ISA has pursued minimalism to a fault. There is a large emphasis on minimizing instruction count, normalizing encoding, etc. This pursuit of minimalism has resulted in false orthogonalities (such as reusing the same instruction for branches, calls and returns) and a requirement for superfluous instructions which impacts code density both in terms of size and

@haitlahcen
haitlahcen / puretypesystem.hs
Last active February 6, 2019 22:01
Pure Type System draft
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Control.Applicative
import Data.Attoparsec.Text as A
@ddevault
ddevault / Makefile
Last active February 20, 2024 14:17
Tiny Wayland compositor
WAYLAND_PROTOCOLS=/usr/share/wayland-protocols
# wayland-scanner is a tool which generates C headers and rigging for Wayland
# protocols, which are specified in XML. wlroots requires you to rig these up
# to your build system yourself and provide them in the include path.
xdg-shell-protocol.h:
wayland-scanner server-header \
$(WAYLAND_PROTOCOLS)/stable/xdg-shell/xdg-shell.xml $@
xdg-shell-protocol.c: xdg-shell-protocol.h
@Average-user
Average-user / FreeTrees.hs
Last active April 21, 2020 16:22
Implementation of the WROM algorithm for Constant Time generation of Free Trees
{-# LANGUAGE BangPatterns #-}
import qualified Data.Vector as V
import System.CPUTime
import System.Environment
import Text.Printf
{- Implementation of the WROM algorithm for finding all
free trees of a given order. The algorithm is explained
here: