Bootstrap knowledge of LLMs ASAP. With a bias/focus to GPT.
Avoid being a link dump. Try to provide only valuable well tuned information.
Neural network links before starting with transformers.
from bitsandbytes.nn.modules import Linear8bitLt, Linear4bit | |
from contextlib import contextmanager | |
def noop (x=None, *args, **kwargs): | |
"Do nothing" | |
return x | |
@contextmanager | |
def no_kaiming(): | |
old_iku = init.kaiming_uniform_ |
{-# LANGUAGE TypeSynonymInstances #-} | |
data Dual d = D Float d deriving Show | |
type Float' = Float | |
diff :: (Dual Float' -> Dual Float') -> Float -> Float' | |
diff f x = y' | |
where D y y' = f (D x 1) | |
class VectorSpace v where | |
zero :: v |
The SalesForce CodeGen models are a family of large language models trained on a large amount of natural language data and then fine-tuned on specialized datasets of code. Models of size 350M, 2B, 6B, and 16B parameters are provided in three flavors:
Twitter thread: https://twitter.com/theshawwn/status/1456925974919004165
Hacker News thread: https://news.ycombinator.com/item?id=29128998
November 6, 2021
jnp.device_put(1)
is deceptively simple to write in JAX. But on a TPU, what actually happens? How does a tensor containing the value 1
actually get onto a TPU?
Turns out, the answer is "C++", and a lot of it.
#!/bin/bash | |
# Attempt to set up the Nvidia GeForce GT 710 on a Pi CM4. | |
# | |
# I have tried both armv7l and aarch64 versions of the proprietary driver, in | |
# addition to the nouveau open source driver (which needs to be compiled into | |
# a custom Raspberry Pi kernel). | |
# | |
# tl;dr - None of the drivers worked :P |
This document was originally written several years ago. At the time I was working as an execution core verification engineer at Arm. The following points are coloured heavily by working in and around the execution cores of various processors. Apply a pinch of salt; points contain varying degrees of opinion.
It is still my opinion that RISC-V could be much better designed; though I will also say that if I was building a 32 or 64-bit CPU today I'd likely implement the architecture to benefit from the existing tooling.
Mostly based upon the RISC-V ISA spec v2.0. Some updates have been made for v2.2
The RISC-V ISA has pursued minimalism to a fault. There is a large emphasis on minimizing instruction count, normalizing encoding, etc. This pursuit of minimalism has resulted in false orthogonalities (such as reusing the same instruction for branches, calls and returns) and a requirement for superfluous instructions which impacts code density both in terms of size and
{-# LANGUAGE DataKinds #-} | |
{-# LANGUAGE FlexibleInstances #-} | |
{-# LANGUAGE GADTs #-} | |
{-# LANGUAGE KindSignatures #-} | |
{-# LANGUAGE OverloadedStrings #-} | |
module Main where | |
import Control.Applicative | |
import Data.Attoparsec.Text as A |
WAYLAND_PROTOCOLS=/usr/share/wayland-protocols | |
# wayland-scanner is a tool which generates C headers and rigging for Wayland | |
# protocols, which are specified in XML. wlroots requires you to rig these up | |
# to your build system yourself and provide them in the include path. | |
xdg-shell-protocol.h: | |
wayland-scanner server-header \ | |
$(WAYLAND_PROTOCOLS)/stable/xdg-shell/xdg-shell.xml $@ | |
xdg-shell-protocol.c: xdg-shell-protocol.h |
{-# LANGUAGE BangPatterns #-} | |
import qualified Data.Vector as V | |
import System.CPUTime | |
import System.Environment | |
import Text.Printf | |
{- Implementation of the WROM algorithm for finding all | |
free trees of a given order. The algorithm is explained | |
here: |