Skip to content

Instantly share code, notes, and snippets.

View vgoklani's full-sized avatar

Vishal Goklani vgoklani

View GitHub Profile
@kalomaze
kalomaze / modeling_mixtral.py
Created May 5, 2024 03:38
Fixed Mixtral training code for HF Transformers
# coding=utf-8
# Copyright 2023 Mixtral AI and the HuggingFace Inc. team. All rights reserved.
#
# This code is based on EleutherAI's GPT-NeoX library and the GPT-NeoX
# and OPT implementations in this library. It has been modified from its
# original forms to accommodate minor architectural differences compared
# to GPT-NeoX and OPT used by the Meta AI team that trained the model.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@vgoklani
vgoklani / torch_ddp_verify.py
Created April 17, 2024 22:21 — forked from jxmorris12/torch_ddp_verify.py
verify parameter weights & gradients in pytorch
def verify_ddp_weights_equal(model: torch.nn.Module, atol: float = 1e-5) -> None:
if hasattr(model, "module"):
model = model.module
world_size = get_world_size()
for name, param in model.named_parameters():
gathered_param = gather(param).reshape((world_size, -1))
absolute_diffs = (gathered_param[None, 0, :] - gathered_param).abs()
rank_params_eq = (absolute_diffs < atol).all()
assert rank_params_eq, f"❌ param [{name}] not equal - got max_absolute_diff={absolute_diffs.max()}"
@jxmorris12
jxmorris12 / torch_ddp_verify.py
Last active April 19, 2024 15:54
verify parameter weights & gradients in pytorch
def verify_ddp_weights_equal(model: torch.nn.Module, atol: float = 1e-5) -> None:
if hasattr(model, "module"):
model = model.module
world_size = get_world_size()
for name, param in model.named_parameters():
gathered_param = gather(param).reshape((world_size, -1))
absolute_diffs = (gathered_param[None, 0, :] - gathered_param).abs()
rank_params_eq = (absolute_diffs < atol).all()
assert rank_params_eq, f"❌ param [{name}] not equal - got max_absolute_diff={absolute_diffs.max()}"
@Birch-san
Birch-san / llama_flash.py
Last active January 22, 2024 06:05
Loading llama with Flash Attention
from transformers import (
AutoConfig,
AutoTokenizer,
BitsAndBytesConfig,
GenerationConfig,
AutoModelForCausalLM,
LlamaTokenizerFast,
PreTrainedModel,
TextIteratorStreamer,
StoppingCriteria,
@mara004
mara004 / pypdfjs.py
Last active May 5, 2024 14:39
PDF rendering with pdf.js, from Python
# SPDX-FileCopyrightText: 2023 mara004
# SPDX-License-Identifier: CC-BY-4.0 OR Apache-2.0
# See also https://github.com/extremeheat/JSPyBridge/blob/master/examples/python/pdfjs.py
# Py-Depends: pillow, javascript >= 1.1.0 (jspybridge)
# Js-Depends: pdfjs-dist, canvas
# Use `python -m pip install` and `python -m javascript --install`
import argparse
@codekansas
codekansas / benchmark_self_attention.py
Last active March 11, 2023 18:34
Benchmarking script for attention
import argparse
import contextlib
import logging
import math
import random
import time
from dataclasses import dataclass
from pathlib import Path
from typing import Callable
@Chillee
Chillee / 1-pw_op_fusion.py
Last active February 26, 2024 20:45
PT 2.0 Benchmarks
import torch
import torch._inductor.config
import time
torch._inductor.config.triton.cudagraphs = False
torch.set_float32_matmul_precision('high')
def bench(f, name=None, iters=100, warmup=5, display=True, profile=False):
for _ in range(warmup):
f()
Work Details
Augmenting convnets with aggregated attention Tutorial by Aritra
Train a Vision Transformer on small datasets Tutorial by Aritra
MobileViT Tutorial by Sayak
Compact Convolutional Transformers Tutorial by Sayak
Data efficient image transformers TF implementation, TF pre-trained models, tutorial by Sayak
Class attention image transformers TF implementation, TF pre-trained models by Sayak
Masked Autoencoders TF implementation, tutorial by Aritra and Sayak,
Contribution to Hugging Face Transformers by Aritra and Sayak
Probing the representation of ViTs
@simonster
simonster / attention_distance.py
Last active April 30, 2024 11:43
Mean attention distance
# Copyright 2022 Google LLC.
# SPDX-License-Identifier: Apache-2.0
# Author: Maithra Raghu <maithra@google.com>
def compute_distance_matrix(patch_size, num_patches, length):
"""Helper function to compute distance matrix."""
distance_matrix = np.zeros((num_patches, num_patches))
@madlag
madlag / speed2.py
Created September 30, 2020 17:35
Pytorch CUDA speed test for various data types, with and without AMP
#!/usr/bin/env python
# Any copyright is dedicated to the Public Domain.
# https://creativecommons.org/publicdomain/zero/1.0/
# Written by Francois Fleuret <francois@fleuret.org>
# Modified by François Lagunas <francois.lagunas@m4x.org>
import time, torch