batch_size | seq_len | pad_percentage | HF_time | BT_time | Speedup |
---|---|---|---|---|---|
8 | 64 | 0 | 11.26 | 6.47 | 1.74 |
8 | 64 | 0.1 | 11.09 | 6.63 | 1.67 |
8 | 64 | 0.2 | 11.4 | 6.56 | 1.74 |
8 | 64 | 0.5 | 11.14 | 6.47 | 1.72 |
8 | 64 | 0.75 | 11.57 | 6.56 | 1.76 |
8 | 128 | 0 | 14.26 | 12.09 | 1.18 |
8 | 128 | 0.1 | 14.5 | 12.21 | 1.19 |
8 | 128 | 0.2 | 14.79 | 10.96 | 1.35 |
View profiling_onnxruntime.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import torch | |
import json | |
import pandas as pd | |
import matplotlib.pyplot as plt | |
import os | |
from pathlib import Path | |
import onnxruntime | |
from tqdm import tqdm |
View script.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
A minimal script to compare inference with variable batch sizes vs a fixed batch size long enough to handle all cases. | |
Change `padding_style` to compare. | |
""" | |
import torch | |
from datasets import load_dataset | |
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
View benchmark.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import torch | |
torch.backends.cuda.matmul.allow_tf32 = True | |
import argparse | |
import copy | |
from tqdm import tqdm | |
from transformers import AutoModel | |
import torch._dynamo as dynamo |
View Dockerfile
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
FROM nvidia/cuda:11.7.1-cudnn8-devel-ubuntu22.04 | |
# to be used along https://github.com/sgugger/torchdynamo-tests | |
# build with `docker build -f Dockerfile -t container-torchdynamo .` | |
# run with `docker run --gpus device=4 -it -v $(pwd)/scripts:/workspace container-torchdynamo:latest python verify_dynamo.py` | |
# and then | |
# run with `docker run --gpus device=4 -it -v $(pwd)/scripts:/workspace container-torchdynamo:latest python benchmark.py --use-cuda` | |
# `verify_dynamo.py`: comes from https://github.com/sgugger/torchdynamo-tests |
View table_bert_base_half_t4.md
View table_bert_large_half_t4.md
batch_size | seq_len | pad_percentage | HF_time | BT_time | Speedup |
---|---|---|---|---|---|
8 | 64 | 0 | 25.16 | 13.5 | 1.86 |
8 | 64 | 0.1 | 24.83 | 13.8 | 1.8 |
8 | 64 | 0.2 | 24.82 | 13.48 | 1.84 |
8 | 64 | 0.5 | 24.6 | 13.33 | 1.85 |
8 | 64 | 0.75 | 24.64 | 13.04 | 1.89 |
8 | 128 | 0 | 25.47 | 13.46 | 1.89 |
8 | 128 | 0.1 | 25.54 | 13.84 | 1.85 |
8 | 128 | 0.2 | 25.62 | 13.65 | 1.88 |