Skip to content

Instantly share code, notes, and snippets.

Avatar

fxmarty

View GitHub Profile
@fxmarty
fxmarty / profiling_onnxruntime.py
Created November 22, 2022 15:00
Plot onnxruntime profiling
View profiling_onnxruntime.py
import torch
import json
import pandas as pd
import matplotlib.pyplot as plt
import os
from pathlib import Path
import onnxruntime
from tqdm import tqdm
@fxmarty
fxmarty / script.py
Created November 18, 2022 10:08
Compare variable batch size vs fixed very long batch size
View script.py
"""
A minimal script to compare inference with variable batch sizes vs a fixed batch size long enough to handle all cases.
Change `padding_style` to compare.
"""
import torch
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification
@fxmarty
fxmarty / benchmark.py
Last active November 14, 2022 16:35
Benchmark torchdynamo vs vanilla pytorch
View benchmark.py
import torch
torch.backends.cuda.matmul.allow_tf32 = True
import argparse
import copy
from tqdm import tqdm
from transformers import AutoModel
import torch._dynamo as dynamo
@fxmarty
fxmarty / Dockerfile
Last active November 14, 2022 16:13
Dockerfile to test torchdynamo
View Dockerfile
FROM nvidia/cuda:11.7.1-cudnn8-devel-ubuntu22.04
# to be used along https://github.com/sgugger/torchdynamo-tests
# build with `docker build -f Dockerfile -t container-torchdynamo .`
# run with `docker run --gpus device=4 -it -v $(pwd)/scripts:/workspace container-torchdynamo:latest python verify_dynamo.py`
# and then
# run with `docker run --gpus device=4 -it -v $(pwd)/scripts:/workspace container-torchdynamo:latest python benchmark.py --use-cuda`
# `verify_dynamo.py`: comes from https://github.com/sgugger/torchdynamo-tests
View table_bert_base_half_t4.md
batch_size seq_len pad_percentage HF_time BT_time Speedup
8 64 0 11.26 6.47 1.74
8 64 0.1 11.09 6.63 1.67
8 64 0.2 11.4 6.56 1.74
8 64 0.5 11.14 6.47 1.72
8 64 0.75 11.57 6.56 1.76
8 128 0 14.26 12.09 1.18
8 128 0.1 14.5 12.21 1.19
8 128 0.2 14.79 10.96 1.35
View table_bert_large_half_t4.md
batch_size seq_len pad_percentage HF_time BT_time Speedup
8 64 0 25.16 13.5 1.86
8 64 0.1 24.83 13.8 1.8
8 64 0.2 24.82 13.48 1.84
8 64 0.5 24.6 13.33 1.85
8 64 0.75 24.64 13.04 1.89
8 128 0 25.47 13.46 1.89
8 128 0.1 25.54 13.84 1.85
8 128 0.2 25.62 13.65 1.88