Skip to content

Instantly share code, notes, and snippets.

View Guitaricet's full-sized avatar
💭
Statuses at GitHub? Really?

Vlad Lialin Guitaricet

💭
Statuses at GitHub? Really?
  • UMass Lowell
  • Lowell, MA
View GitHub Profile
@Guitaricet
Guitaricet / better_repr.py
Last active December 8, 2022 06:01
Read pytorch module descriptions more easily
from torch.nn import ModuleList
def _addindent(s_, numSpaces):
s = s_.split('\n')
# don't do anything for single-line stuff
if len(s) == 1:
return s_
first = s.pop(0)
s = [(numSpaces * ' ') + line for line in s]
@Guitaricet
Guitaricet / named_shape.py
Created December 8, 2022 02:39
add NamedShape property to torch.Tensor
import torch
class NamedShape:
"""A convenience class to make beautifully named shapes."""
def __init__(self, tensor) -> None:
self.names = tensor.names
self.shape = tensor.shape
def __repr__(self) -> str:
@Guitaricet
Guitaricet / generate_multigpu.py
Last active September 1, 2022 18:18
Short example of distributed generation. Pure PyTorch Distributed.
# torchrun --nproc_per_node 2 generate_multigpu.py
# tested on torch==1.12.1 and transformers==4.21
import os
import json
import torch
import torch.distributed as dist
import transformers
import datasets
@Guitaricet
Guitaricet / logger.py
Created June 16, 2022 15:41
Good default logger
import os
import sys
import logging
logging.basicConfig(
format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
level=logging.INFO,
stream=sys.stdout,
)
@Guitaricet
Guitaricet / learning_subspaces.ipynb
Created April 5, 2021 19:33
A quick and dirty reimplementation of the training algorithm from Learning Neural Network Subspaces, Wortsman et al., https://arxiv.org/abs/2102.10472
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@Guitaricet
Guitaricet / lite_transformer.py
Last active April 30, 2020 19:46
Lite transformer encoder layer
class LiteTransformerLayer(nn.Module):
def __init__(self, dim=496, heads=4, kernel_size=4):
super().__init__()
assert dim % 2 == 0
self.attention = Attention(dim // 2, heads=heads)
self.cnn = LightweightConv(dim // 2, kernel=kernel_size) # or Dynamic conv
self.fc = nn.Sequential(
nn.Linear(dim, dim),
@Guitaricet
Guitaricet / transformer.py
Last active April 30, 2020 19:46
Simple transformer encoder layer
class TransformerLayer(nn.Module):
def __init__(self, dim=496, heads=4, ffn_dim=1984):
super().__init__()
self.attention = Attention(dim, heads=heads)
self.fc = nn.Sequential(
nn.Linear(dim, ffn_dim),
nn.ReLU(), # chose your favorite nonlinearity here
nn.Linear(ffn_dim, dim),
)
@Guitaricet
Guitaricet / att.py
Last active October 20, 2021 13:42
Very simple self attention implementation
# Just dot product self-attention
class SelfAttention(nn.Module):
def __init__(self, dim=7):
super().__init__()
self.K = nn.Linear(dim, dim)
self.Q = nn.Linear(dim, dim)
self.V = nn.Linear(dim, dim)
self.scale = dim ** 0.5
def forward(self, x):
#!/usr/bin/env python3
import sys
try:
runarg_idx = sys.argv.index('--rundir')
rundir = sys.argv[runarg_idx+1]
import os
os.chdir(os.path.expanduser(rundir))
except ValueError:
@Guitaricet
Guitaricet / reproducibility.md
Last active March 24, 2024 11:11
Notes on reproducibility in PyTorch

Reproducibility

ML experiments may be very hard to reproduce. You have a lot of hyperparameters, different dataset splits, different ways to preprocess your data, bugs, etc. Ideally, you should log data split (already preprocessed), all hyperparameters (including learning rate scheduling), the initial state of your model and optimizer, random seeds used for initialization, dataset shuffling and all of your code. Your GPU is also should be in deterministic mode (which is not the default mode). For every single model run. This is a very hard task. Different random seed can significantly change your metrics and even GPU-induced randomness can be important. We're not solving all of these problems, but we need to address at least what we can handle.

For every result you report in the paper you need (at least) to:

  1. Track your model and optimizer hyperparameters (including learning rate schedule)
  2. Save final model parameters
  3. Report all of the parameters in the pap