Skip to content

Instantly share code, notes, and snippets.

"""
Hugging Face Dataset Verification Tool
This script verifies the integrity of locally downloaded Hugging Face datasets by comparing
SHA256 checksums from the remote repository with local file hashes.
USAGE:
python verify_hf_download.py <repo_id> [local_path]
ARGUMENTS:

Learning AI

If you've never done any programming courses before, I'd recommend Harvard CS50.
You do need a good amount of math at some point, but there are too many possible math resources to easily include here.

Basics

3Blue1Brown's neural networks videos are fantastic introductions to the ideas here. And I hate videos.
Fast.ai has been the go-to starter resource for years. Highly recommended.

General

Courses

MIT's open course is a good survey course that is more theory-oriented.
Huggingface's tutorials are well regarded for a number of specific applied topics.

import torch.nn as nn
import torch.nn.functional as F
import math
# N log N approximation for a multiplication by a random orthogonal matrix
# Mostly untested, depends on having fast_hadamard_transform installed via git
class FastFood(nn.Module):
def __init__(self, size):
super().__init__()
# Check that size is a power of two
# Originally by https://jerryxio.ng/
class MultiPositionRotary(nn.Module):
def __init__(
self,
head_dim: int,
pos_dim: int,
min_freq: float,
max_freq: float,
frozen: bool = True,
Fyodorov, the Russian librarian, was a man of profound thought and even more profound ideas. His days were spent surrounded by the wisdom held within the pages of countless books in the Moscow library. He wasn't content with the mundane tasks of cataloging and shelving; Fyodorov harbored a grander vision, a destiny that transcended the earthly confines of Russia and the 宇宙 (yǔzhòu) [universe, cosmos, space].
His central philosophy, which he developed over many years, was called Cosmism, though it wasn't always given that specific name during his lifetime. It was a radical 融合 (rónghé) [fusion, blend, integration] of science, philosophy, and religion. Fyodorov believed that death wasn't an inevitable part of existence, but rather a 限制 (xiànzhì) [limitation, restriction, constraint] that humanity could and should overcome. He envisioned a future where scientific and technological advancements allowed humans to achieve physical immortality, not just for themselves, but potentially for their ancestors as well.
T
177MF LLC PITTS MODEL 12
2007 SAVAGE AIR LLC EPIC LT
2021FX3 LLC CCX-2000
3XTRIM 450 ULTRA
5 RIVERS LLC SQ-2
737 800
777 FF2
781569 INC FX 210
A VAN NIEROP RISEN 915 IS
A. SCHLEICHER GMBH & CO. ASW 27-18
@segyges
segyges / hf_byte_tokenizer.json
Created March 31, 2024 19:45
Should be good hf tokenizer file for a pure byte tokenizer
{
"version": "1.0",
"truncation": null,
"padding": null,
"added_tokens": [
{
"id": 0,
"content": "<|endoftext|>",
"single_word": false,
"lstrip": false,
@segyges
segyges / lfs_parity_checker.sh
Last active March 3, 2024 01:30
lfs_parity_checker.sh
#!/bin/bash
lfs_files_long=$(git lfs ls-files --long)
checked=0
bad=0
while IFS= read -r line; do
# Read each part of the line into separate variables
read -r hash separator filename <<< "$line"
@segyges
segyges / ubuntu-nvidia-container-setup.sh
Created December 31, 2023 19:18
Script for (re)-installing nvidia drivers, docker, and nvidia-container-toolkit
#!/bin/sh
# you may need to bounce the machine after this before it will work
# success is tested by running: sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
# drivers
sudo ubuntu-drivers install
# install docker
for pkg in docker.io docker-doc docker-compose docker-compose-v2 podman-docker containerd runc; do sudo apt-get remove $pkg; done
sudo apt-get update
@segyges
segyges / ff_modifications_2.py
Last active November 17, 2023 05:18
ff_modifications_2.py
# The last one was getting too long
import torch
import torch.nn as nn
class FeedForwardLayer(nn.Module):
def __init__(self, d_model, nonlinearity):
super(FeedForwardLayer, self).__init__()
self.linear1 = nn.Linear(d_model, d_model*6)
self.linear2 = nn.Linear(d_model, d_model)