Skip to content

Instantly share code, notes, and snippets.

@shawwn
shawwn / since2010.md
Created May 11, 2021 09:46
"What happened after 2010?"

This was a response to a Hacker News comment asking me what I've been up to since 2010. I'm posting it here since HN rejects it with "that comment is too long." I suppose that's fair, since this ended up being something of an autobiography.

--

What happened after 2010?

@shawwn
shawwn / llama_sizes.txt
Created March 5, 2023 18:07
The size of each file distributed with LLaMA, for reference. See https://github.com/shawwn/llama-dl
./tokenizer_checklist.chk 50
./tokenizer.model 499723
./7B/checklist.chk 100
./7B/consolidated.00.pth 13476939516
./7B/params.json 101
./13B/checklist.chk 154
./13B/consolidated.00.pth 13016334699
./13B/consolidated.01.pth 13016334699
./13B/params.json 101
./30B/checklist.chk 262
@shawwn
shawwn / JAX_compliation_cache.md
Last active January 2, 2024 15:46
JAX persistent compilation cache

JAX released a persistent compilation cache for TPU VMs! When enabled, the cache writes compiled JAX computations to disk so they don’t have to be re-compiled the next time you start your JAX program. This can save startup time if any of y’all have long compilation times.

First upgrade to the latest jax release:

pip install -U "jax[tpu]>=0.2.18" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html

Then use the following to enable the cache in your jax code:

from jax.experimental.compilation_cache import compilation_cache as cc
import os
import sys
import re
import time
import psutil
import platform
import json
import collections
from subprocess import check_output
@shawwn
shawwn / llama.md
Last active July 29, 2023 19:32
A transcript of an interview I did for The Verge on March 6, 2023 about LLaMA, Facebook's new 65 billion parameter language model that was recently leaked to the internet: https://news.ycombinator.com/item?id=35007978

The Verge: "Meta’s powerful AI language model has leaked online — what happens now?"


Could you confirm that you downloaded the LLaMA series from 4chan? Were you able to get it running yourself or did you just repackage the download? (I was a bit confused reading your tweets about that what exactly you'd done there, so if you're able to explain that, it'd be great)

I downloaded it from Facebook, actually. You can find some details here.

Basically, the sequence of events was:

@shawwn
shawwn / glob.cpp
Created June 15, 2023 06:16
A simple glob function that returns a vector of strings for POSIX
#include <glob.h>
#include <vector>
#include <string>
namespace util
{
std::vector<std::string> glob(const std::string& pattern) {
glob_t glob_result = {0}; // zero initialize
@shawwn
shawwn / 65b_samples.txt
Last active May 18, 2023 06:35
Some LLaMA 65B outputs after fixing the sampler settings.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
{"seed": 374894, "temp": 0.7, "top_p": 0.0, "top_k": 40, "repetition_penalty": 1.1764705882352942, "max_seq_len": 512, "max_gen_len": 511}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Loading
Loaded in 8.72 seconds
============== sample 1 =================
I believe the meaning of life is to grow, learn and give.
@shawwn
shawwn / cmake_test.cmake
Created May 12, 2023 04:46
I was playing around with writing a lisp-to-cmake compiler. https://github.com/shawwn/pymen/tree/cmake
cmake_policy(VERSION "3.25.0")
set(reserved
ALL
"=" ON
"==" ON
"+" ON
"_" ON
"%" ON
"*" ON
"/" ON
@shawwn
shawwn / What happens when you allocate a JAX tensor on a TPU.md
Last active April 15, 2023 04:11
JAX C++ stack trace walkthrough for TpuExecutor_Allocate
You have fallen into Event Horizon with John Michael Gadia.
In today's episode, John is joined by Sean Pracer.
Sean Pracer is an AI researcher and machine learning engineer.
He has contributed to projects such as ThePile, an open source
training data set for large language models.
He currently works on research and development for AGI.