Skip to content

Instantly share code, notes, and snippets.

View KeremTurgutlu's full-sized avatar
Having Fun

Kerem Turgutlu KeremTurgutlu

Having Fun
View GitHub Profile
KeremTurgutlu / exp.ipynb
Last active January 10, 2024 14:29
QLORA Memory Experiments
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
KeremTurgutlu /
Created October 28, 2023 05:03
GPT-Eva Templates
gpt_eval_template_coherence = """
You will be given title: [TITLE] and description: [DESC] written from a set of information of a real estate listing in Turkish.
Your task is to rate the title and description on one metric.
Please make sure you read and understand these instructions carefully. Please keep this
document open while reviewing, and refer to it as needed.
Evaluation Criteria:
Coherence (1-5) - the collective quality of all sentences. We align this dimension with
KeremTurgutlu /
Last active October 7, 2023 04:44
Multipack Sampler x Flash Attention
Testing flash attn with multipacking which essentially packs sequences using,
and passes a single sequence of `1 x (bs x seqlen)` to the model to avoid padding.
An alternative is to use block diagonal attention as attention bias, but the following uses flash attention 2 which
is much faster.
Multipacking can be used to speed up both pretraining and finetuning.
We can make this file beautiful and searchable if this error is corrected: No tabs found in this TSV file in line 0.
KeremTurgutlu /
Last active July 26, 2022 03:10
EMA and SWA callbacks for different model averaging techniques
from import *
__all__ = ["EMA", "SWA"]
class EMA(Callback):
def __init__(self, decay=0.9999):
self.decay = decay
from import *
from torch.cuda.amp import autocast, GradScaler
from torch.cuda.amp.grad_scaler import _refresh_per_optimizer_state
from sam import SAM
class FastaiSched:
def __init__(self, optimizer, max_lr):
self.optimizer = optimizer
self.lr_sched = combine_scheds([0.1,0.9], [SchedLin(1e-8,max_lr), SchedCos(max_lr,1e-8)])
KeremTurgutlu /
Created March 11, 2021 01:53
ZeRO optimizer example
import wandb
from fastai.callback.wandb import WandbCallback
from fastai.distributed import *
torch.backends.cudnn.benchmark = True
from zero_optimizer import ZeroRedundancyOptimizer
def after_batch(self: WandbCallback):
KeremTurgutlu /
Created March 2, 2021 22:54
Fastai WANDB Callback with DDP
def main(
size: Param("Image resolution", int)=224,
bs: Param("Batch Size", int)=128,
epochs: Param("Number of epochs for training", int)=1,
lr: Param("Learning rate for training", float)=5e-5):
WANDB = True
# start wandb
import os
import torch
import torch.distributed as dist
from torch.multiprocessing import Process
from torchvision import datasets, transforms
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
import random
KeremTurgutlu /
Last active September 20, 2023 00:57
Debugging: Distributed InfoNCE Loss
# CLIP contrastive loss is calculated all the negative batch samples from all the GPUs
# How to implement that?
# For more info:
import os
import sys
import tempfile
import torch
import torch.distributed as dist
import torch.nn as nn