Skip to content

Instantly share code, notes, and snippets.

@imaginary-person
Created May 18, 2021 03:39
Show Gist options
  • Save imaginary-person/111e9ece31a4c754bde3c41936cae496 to your computer and use it in GitHub Desktop.
Save imaginary-person/111e9ece31a4c754bde3c41936cae496 to your computer and use it in GitHub Desktop.
Quantization benchmark
import torch
import time
x = torch.rand(1, 256, 256, 256)
y = torch.rand(1, 256, 256, 256)
print('dtype', 'ms/iter (float)', 'ms/iter (quant)', 'quant / float', sep='\t')
for dtype in [torch.quint8, torch.qint8, torch.qint32]:
qX = torch.quantize_per_tensor(x, 0.1, 5, dtype)
qY = torch.quantize_per_tensor(y, 0.1, 5, dtype)
NITER = 1000
# Test float
s = time.time()
for i in range(NITER):
x + y
elapsed_float = time.time() - s
ms_per_iter_float = elapsed_float / NITER * 1000
# Test quantized
s = time.time()
for i in range(NITER):
torch.ops.quantized.add(qX, qY, 0.1, 5)
elapsed = time.time() - s
ms_per_iter = elapsed / NITER * 1000
print(str(dtype), ms_per_iter_float, ms_per_iter, ms_per_iter / ms_per_iter_float, sep='\t')
@imaginary-person
Copy link
Author

imaginary-person commented May 18, 2021

Single-thread results

AVX512

dtype ms/iter (float) ms/iter (quant) Ratio of quant / float
torch.quint8 36.939911127090454 3.7805848121643066 0.10234417725471344
torch.qint8 37.00599193572998 4.106637716293335 0.11097223723729721
torch.qint32 37.02128195762634 119.3980541229248 3.225119385643774

AVX2

dtype ms/iter (float) ms/iter (quant) Ratio of quant / float
torch.quint8 37.32784390449524 5.5174880027771 0.1478115911782585
torch.qint8 37.252843379974365 5.276324987411499 0.14163549701679523
torch.qint32 37.287841796875 126.53827834129333 3.3935532936073063

Multiple threads results (32 threads, Two NUMA nodes with 32 physical cores, Hyperthreading ON)

AVX512

dtype ms/iter (float) ms/iter (quant) Ratio of quant / float
torch.quint8 9.661890745162964 0.16221380233764648 0.01678903297668271
torch.qint8 9.314903020858765 0.16486120223999023 0.017698649344047735
torch.qint32 9.200461864471436 9.027985572814941 0.9812535181171144

AVX2

dtype ms/iter (float) ms/iter (quant) Ratio of quant / float
torch.quint8 8.55333685874939 0.2317960262298584 0.027100069839146968
torch.qint8 8.752809047698975 0.22887539863586426 0.026148793763075787
torch.qint32 8.66388750076294 9.35800290107727 1.080115929512382

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment