Skip to content

Instantly share code, notes, and snippets.

View cloneofsimo's full-sized avatar

Simo Ryu cloneofsimo

View GitHub Profile
import os
import torch
import json
from PIL import Image
from torch.utils.data import Dataset, DataLoader
from diffusers.models import AutoencoderKL
from streaming import MDSWriter
import logging
import time
import os
import torch
import json
from PIL import Image
from torch.utils.data import Dataset
from diffusers.models import AutoencoderKL
from streaming import MDSWriter
import logging
import time
@cloneofsimo
cloneofsimo / sample.py
Created July 1, 2024 20:04
self-contained sampling code
## MM DiT model that was proposed by SD3 paper.
# I've tried to make this follow the work of MuP, so they scale in maximal feature-learning regime.
import math
import torch
import torch.nn as nn
import torch.nn.functional as F
@cloneofsimo
cloneofsimo / merge_mds_fast.py
Created June 24, 2024 00:59
MDS-Multiprocessed-datamerging to NFS, because writing is async this is faster
import os
import json
from glob import glob
from tqdm import tqdm
from multiprocessing import Pool, Manager, cpu_count
def with_id(basename: str, shard_id: int) -> str:
parts = basename.split(".")
parts[1] = f"{shard_id:07}"
return ".".join(parts)
@cloneofsimo
cloneofsimo / low-rank-transpose-inv-similarity.py
Created June 3, 2024 17:14
Is your backprop secretly linear solver?
# motivated by https://x.com/yaroslavvb/status/1797662470859071892
import matplotlib.pyplot as plt
import numpy as np
def cosine_similarity(v1, v2):
return np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))
def stable_rank(matrix):
s = np.linalg.svd(matrix, compute_uv=False)
@cloneofsimo
cloneofsimo / justreadwds.py
Created June 1, 2024 10:07
Read? I get about 1000 img / s
import os
import json
from PIL import Image
import logging
from torch.utils.data import DataLoader
import webdataset as wds
import argparse
from tqdm import tqdm
import numpy as np
from torchvision import transforms
@cloneofsimo
cloneofsimo / lance.py
Created May 30, 2024 20:39
lance dataset concurrent writes?
import lance
import pyarrow as pa
import numpy as np
import time
import os
import multiprocessing as mp
def producer(N =1 ):
yield pa.RecordBatch.from_arrays([
@cloneofsimo
cloneofsimo / dockersetup.md
Last active May 1, 2024 09:16
dockersetup.md

Setting up a Docker environment with PyTorch on your Linux server so that your friends can access it involves several steps, including installing Docker, pulling or building a PyTorch Docker image, and configuring Docker to ensure it is accessible securely from other machines. Here’s a step-by-step guide to get you started:

Step 1: Install Docker

  1. Update your system: Ensure your package lists and installed packages are updated.

    sudo apt update && sudo apt upgrade -y
  2. Install Docker: Install Docker using the convenience script from Docker which works on many Linux distributions.

@cloneofsimo
cloneofsimo / common.md
Last active June 18, 2024 10:48
Very common stuff that I forget all the time and I really shouldn't

clean clean

black .
autoflake --remove-all-unused-imports -i ./*
isort .

Check basic stuff

@cloneofsimo
cloneofsimo / unfold_images.py
Created September 24, 2023 13:32
Got confused by Unfold operation, yet again LOL
import torch
import torch.nn.functional as F
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
import torch.nn as nn
# Load an example image
image_path = "/home/simo/just_dl_stuff/vid2data/vender_1.png"
img = Image.open(image_path).convert("RGB")