Skip to content

Instantly share code, notes, and snippets.

View KeremTurgutlu's full-sized avatar
:octocat:
Having Fun

Kerem Turgutlu KeremTurgutlu

:octocat:
Having Fun
View GitHub Profile
@KeremTurgutlu
KeremTurgutlu / tinygemm_vs_bitblas.py
Last active July 11, 2024 20:40
HQQ Tinygemm vs BitBlas Benchmark
import torch
import numpy as np
from hqq.core.quantize import HQQLinear, BaseQuantizeConfig, Quantizer, HQQBackend
from hqq.backends.torchao import HQQLinearTorchWeightOnlynt4, patch_hqq_to_aoint4
# from unpack_int4.ops import unpack_int4_packed
import torchao
import bitblas
# unpack_cuda_compiled = torch.compile(torchao.ops.unpack_int4_to_int, mode="default", fullgraph=True)
from bitblas.cache import global_operator_cache, get_database_path
@KeremTurgutlu
KeremTurgutlu / test_triton_mm.ipynb
Last active May 24, 2024 13:51
test_triton_mm.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@KeremTurgutlu
KeremTurgutlu / exp.ipynb
Last active January 10, 2024 14:29
QLORA Memory Experiments
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@KeremTurgutlu
KeremTurgutlu / gpt_eval_templates.py
Created October 28, 2023 05:03
GPT-Eva Templates
gpt_eval_template_coherence = """
You will be given title: [TITLE] and description: [DESC] written from a set of information of a real estate listing in Turkish.
Your task is to rate the title and description on one metric.
Please make sure you read and understand these instructions carefully. Please keep this
document open while reviewing, and refer to it as needed.
Evaluation Criteria:
Coherence (1-5) - the collective quality of all sentences. We align this dimension with
@KeremTurgutlu
KeremTurgutlu / multipack_sampler_flash_attn.py
Last active October 7, 2023 04:44
Multipack Sampler x Flash Attention
"""
Testing flash attn with multipacking which essentially packs sequences using https://github.com/imoneoi/multipack_sampler,
and passes a single sequence of `1 x (bs x seqlen)` to the model to avoid padding.
An alternative is to use block diagonal attention as attention bias, but the following uses flash attention 2 which
is much faster.
Multipacking can be used to speed up both pretraining and finetuning.
"""
We can make this file beautiful and searchable if this error is corrected: No tabs found in this TSV file in line 0.
TsvHttpData-1.0
https://files.pushshift.io/reddit/comments/RC_2005-12.zst
@KeremTurgutlu
KeremTurgutlu / ema_swa.py
Last active July 26, 2022 03:10
EMA and SWA callbacks for different model averaging techniques
from fastai.vision.all import *
__all__ = ["EMA", "SWA"]
class EMA(Callback):
"https://fastai.github.io/timmdocs/training_modelEMA"
order,run_valid=5,False
def __init__(self, decay=0.9999):
super().__init__()
self.decay = decay
from fastai.vision.all import *
from torch.cuda.amp import autocast, GradScaler
from torch.cuda.amp.grad_scaler import _refresh_per_optimizer_state
from sam import SAM
class FastaiSched:
def __init__(self, optimizer, max_lr):
self.optimizer = optimizer
self.lr_sched = combine_scheds([0.1,0.9], [SchedLin(1e-8,max_lr), SchedCos(max_lr,1e-8)])
self.update(0)
@KeremTurgutlu
KeremTurgutlu / zero_training.py
Created March 11, 2021 01:53
ZeRO optimizer example
import wandb
from fastai.callback.wandb import WandbCallback
from fastai.distributed import *
torch.backends.cudnn.benchmark = True
from zero_optimizer import ZeroRedundancyOptimizer
@patch
def after_batch(self: WandbCallback):
@KeremTurgutlu
KeremTurgutlu / distributed_wandb.py
Created March 2, 2021 22:54
Fastai WANDB Callback with DDP
@call_parse
def main(
size: Param("Image resolution", int)=224,
bs: Param("Batch Size", int)=128,
epochs: Param("Number of epochs for training", int)=1,
lr: Param("Learning rate for training", float)=5e-5):
WANDB = True
# start wandb