Skip to content

Instantly share code, notes, and snippets.

View yiliu30's full-sized avatar
🌍
Working on site

Yi Liu yiliu30

🌍
Working on site
  • AI Frameworks Engineer @intel
  • SH
  • 20:21 (UTC +08:00)
View GitHub Profile
import torch
# User scripts
class CustomModel(torch.nn.Module):
def __init__(self) -> None:
super().__init__()
self.fc = torch.nn.Linear(10, 10)
@yiliu30
yiliu30 / forgot_to_check_out_with_recurse_submodules.md
Created March 9, 2024 11:59 — forked from cnlohr/forgot_to_check_out_with_recurse_submodules.md
Git forgot to clone recursively (forgot to check out with recurse submodules)
# https://pytorch.org/tutorials/prototype/pt2e_quant_ptq_x86_inductor.html
import torch._inductor.config as config
import torch
import copy
from torch.ao.quantization.quantize_pt2e import prepare_pt2e, convert_pt2e
import torch.ao.quantization.quantizer.x86_inductor_quantizer as xiq
from torch.ao.quantization.quantizer.x86_inductor_quantizer import X86InductorQuantizer
from torch._export import capture_pre_autograd_graph

1. Setting up the environment

To install the Intel Gaudi Software Stack and launch the docker image, please follow this guide.

docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.14.0/ubuntu22.04//habanalabs/pytorch-installer-2.1.1:latest

# Check the container ID
docker ps

# Login into container
# =============================================================================
# Mem tracker
# Refactored from https://github.com/pytorch/pytorch/pull/124688
# =============================================================================
from torch.utils.flop_counter import FlopCounterMode
import torch
import math
import torch
from torch import nn
from torch.utils.flop_counter import FlopCounterMode
def test_mt_loop():
class DummyModel(nn.Module):
def __init__(self, layers: int, dim: int):
super(DummyModel, self).__init__()
self._module_list = []
@yiliu30
yiliu30 / markdonw_emoji.md
Last active December 3, 2024 05:47 — forked from rxaviers/gist:7360908
Complete list of github markdown emoji markup

People

:bowtie: :bowtie: πŸ˜„ :smile: πŸ˜† :laughing:
😊 :blush: πŸ˜ƒ :smiley: ☺️ :relaxed:
😏 :smirk: 😍 :heart_eyes: 😘 :kissing_heart:
😚 :kissing_closed_eyes: 😳 :flushed: 😌 :relieved:
πŸ˜† :satisfied: 😁 :grin: πŸ˜‰ :wink:
😜 :stuck_out_tongue_winking_eye: 😝 :stuck_out_tongue_closed_eyes: πŸ˜€ :grinning:
πŸ˜— :kissing: πŸ˜™ :kissing_smiling_eyes: πŸ˜› :stuck_out_tongue:
2024-06-17T01:47:06.5681923Z ==================================== ERRORS ====================================
2024-06-17T01:47:06.5682582Z _____________ ERROR at setup of TestAutoRound.test_autoround[True] _____________
2024-06-17T01:47:06.5682735Z
2024-06-17T01:47:06.5683220Z self = <class 'test_autoround.TestAutoRound'>
2024-06-17T01:47:06.5683303Z
2024-06-17T01:47:06.5683401Z def setup_class(self):
2024-06-17T01:47:06.5683608Z self.gptj = transformers.AutoModelForCausalLM.from_pretrained(
2024-06-17T01:47:06.5684018Z "hf-internal-testing/tiny-random-GPTJForCausalLM",
2024-06-17T01:47:06.5684147Z torchscript=True,
2024-06-17T01:47:06.5684250Z )

If I understand correctly, currently, TGI selects a kernel for all layers based on the algorithm name. Do you consider extending this to allow a mapping between layer names and kernels. This would decouple the quantization process (calculating the scale and zero for a given tensor) from the inference process (selecting the right kernel). Here are some thoughts:

  • Support for mixed data types and bitwidths: This would enable models to use different precisions for different layers, maintaining higher precision for critical layers. For instance, GPTQ does not quantize the lm_head. Similarly, llama.cpp uses different bitwidths for feedforward layers and other layers. TGI currently lacks this capability.
  • Eliminate redundant unpack-repack processes: If we require all quantized models to have the GPTQ format and want to use a new kernel like marlin, the flow involves multiple packing and unpacking steps. Ideally, we should only need one packing step.
@yiliu30
yiliu30 / 1-pw_op_fusion.py
Created July 7, 2024 04:12 — forked from Chillee/1-pw_op_fusion.py
PT 2.0 Benchmarks
import torch
import torch._inductor.config
import time
torch._inductor.config.triton.cudagraphs = False
torch.set_float32_matmul_precision('high')
def bench(f, name=None, iters=100, warmup=5, display=True, profile=False):
for _ in range(warmup):
f()