Skip to content

Instantly share code, notes, and snippets.

@chsasank
Last active August 17, 2021 15:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save chsasank/653fe82801fc6cbfe605a6f90f3c5293 to your computer and use it in GitHub Desktop.
Save chsasank/653fe82801fc6cbfe605a6f90f3c5293 to your computer and use it in GitHub Desktop.
PyTorch Benchmarks on my systems

M1 Macbook Air

MacBook Air (M1, 2020)

Native (brew)

$ /opt/homebrew/bin/python3 -c 'import platform; print(platform.platform())'
macOS-11.4-arm64-arm-64bit
$ /opt/homebrew/bin/python3 pytorch-benchmark.py 
running on CPU
batchsize: 32
Forward throughput:    alexnet : 186.109976
batchsize: 32
Forward throughput:   resnet18 : 51.881408
batchsize: 32
Forward throughput:   resnet50 : 50.912847
batchsize: 32
Forward throughput:      vgg16 : 5.041840
batchsize: 32
Forward throughput: squeezenet : 104.973816

Rosetta2 (miniconda)

$ python -c 'import platform; print(platform.platform())'
macOS-10.16-x86_64-i386-64bit
$ python pytorch-benchmark.py 
running on CPU
batchsize: 32
Forward throughput:    alexnet : 23.816442
batchsize: 32
Forward throughput:   resnet18 : 9.049998
batchsize: 32
Forward throughput:   resnet50 : 8.963913
batchsize: 32
Forward throughput:      vgg16 : 1.126201
batchsize: 32
Forward throughput: squeezenet : 14.312304

Asus ROG G752VS

  • GPU: GeForce GTX 1070
  • CPU: Intel(R) Core(TM) i7-6820HK CPU @ 2.70GHz

CPU

$ CUDA_VISIBLE_DEVICES=-1 python pytorch-benchmark.py
running on CPU
batchsize: 32
Forward throughput:    alexnet : 100.022931
batchsize: 32
Forward throughput:   resnet18 : 33.985795
batchsize: 32
Forward throughput:   resnet50 : 33.383028
batchsize: 32
Forward throughput:      vgg16 : 5.020049
batchsize: 32
Forward throughput: squeezenet : 49.401246

GPU

$ python pytorch-benchmark.py
gpu available
batchsize: 32
Forward throughput:    alexnet : 3769.349007
batchsize: 32
Forward throughput:   resnet18 : 1097.269584
batchsize: 32
Forward throughput:   resnet50 : 1088.810844
batchsize: 32
Forward throughput:      vgg16 : 209.317888
batchsize: 32
Forward throughput: squeezenet : 984.347351

Linux Server 1

  • GPU: GeForce GTX 1080 ti
  • CPU: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz

CPU

$ CUDA_VISIBLE_DEVICES=-1 python pytorch-benchmark.py
running on CPU
batchsize: 32
Forward throughput:    alexnet : 184.523443
batchsize: 32
Forward throughput:   resnet18 : 57.881935
batchsize: 32
Forward throughput:   resnet50 : 62.913347
batchsize: 32
Forward throughput:      vgg16 : 9.876533
batchsize: 32
Forward throughput: squeezenet : 75.650274

GPU

$ python pytorch-benchmark.py
gpu available
batchsize: 32
Forward throughput:    alexnet : 6767.500603
batchsize: 32
Forward throughput:   resnet18 : 2036.754280
batchsize: 32
Forward throughput:   resnet50 : 2017.909513
batchsize: 32
Forward throughput:      vgg16 : 395.942114
batchsize: 32
Forward throughput: squeezenet : 1731.326387

Linux Server 2

  • CPU: AMD Ryzen Threadripper 2920X 12-Core Processor
  • GPU: NVIDIA GeForce RTX 2080 Ti

CPU

$ CUDA_VISIBLE_DEVICES=-1 python pytorch-benchmark.py
running on CPU
batchsize: 32
Forward throughput:    alexnet : 91.100886
batchsize: 32
Forward throughput:   resnet18 : 68.885591
batchsize: 32
Forward throughput:   resnet50 : 76.730487
batchsize: 32
Forward throughput:      vgg16 : 12.766969
batchsize: 32
Forward throughput: squeezenet : 83.295199

GPU

$ python pytorch-benchmark.py
gpu available
batchsize: 32
Forward throughput:    alexnet : 10234.365602
batchsize: 32
Forward throughput:   resnet18 : 2449.743998
batchsize: 32
Forward throughput:   resnet50 : 2444.856202
batchsize: 32
Forward throughput:      vgg16 : 601.530237
batchsize: 32
Forward throughput: squeezenet : 2665.512214
import time
import torch
import torch.nn as nn
import torchvision.models as models
from torch.autograd import Variable
def tput(model, name):
print('batchsize: 32')
input = torch.rand(32,3,224,224)
if torch.cuda.is_available():
model = model.cuda()
input = input.cuda()
n = 1000
else:
n = 50
model(input)
T = 0
for _ in range(n):
t1 = time.time()
model(input)
t2 = time.time()
T += (t2-t1)
T /= n
print('Forward throughput: %10s : %f' % (name, 32/T))
if __name__ == '__main__':
if torch.cuda.is_available():
print('gpu available')
else:
print('running on CPU')
alexnet = models.alexnet()
resnet18 = models.resnet18()
resnet50 = models.resnet50()
vgg16 = models.vgg16()
squeezenet = models.squeezenet1_0()
tput(alexnet, 'alexnet')
tput(resnet18, 'resnet18')
tput(resnet18, 'resnet50')
tput(vgg16, 'vgg16')
tput(squeezenet, 'squeezenet')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment