chsasank/Benchmarks.md

## Benchmarks.md

      
    Raw
  

              Benchmarks.md
            
          
    M1 Macbook Air

MacBook Air (M1, 2020)
Native (brew)

$ /opt/homebrew/bin/python3 -c 'import platform; print(platform.platform())'
macOS-11.4-arm64-arm-64bit
$ /opt/homebrew/bin/python3 pytorch-benchmark.py 
running on CPU
batchsize: 32
Forward throughput:    alexnet : 186.109976
batchsize: 32
Forward throughput:   resnet18 : 51.881408
batchsize: 32
Forward throughput:   resnet50 : 50.912847
batchsize: 32
Forward throughput:      vgg16 : 5.041840
batchsize: 32
Forward throughput: squeezenet : 104.973816
Rosetta2 (miniconda)

$ python -c 'import platform; print(platform.platform())'
macOS-10.16-x86_64-i386-64bit
$ python pytorch-benchmark.py 
running on CPU
batchsize: 32
Forward throughput:    alexnet : 23.816442
batchsize: 32
Forward throughput:   resnet18 : 9.049998
batchsize: 32
Forward throughput:   resnet50 : 8.963913
batchsize: 32
Forward throughput:      vgg16 : 1.126201
batchsize: 32
Forward throughput: squeezenet : 14.312304
Asus ROG G752VS


GPU: GeForce GTX 1070
CPU: Intel(R) Core(TM) i7-6820HK CPU @ 2.70GHz

CPU

$ CUDA_VISIBLE_DEVICES=-1 python pytorch-benchmark.py
running on CPU
batchsize: 32
Forward throughput:    alexnet : 100.022931
batchsize: 32
Forward throughput:   resnet18 : 33.985795
batchsize: 32
Forward throughput:   resnet50 : 33.383028
batchsize: 32
Forward throughput:      vgg16 : 5.020049
batchsize: 32
Forward throughput: squeezenet : 49.401246
GPU

$ python pytorch-benchmark.py
gpu available
batchsize: 32
Forward throughput:    alexnet : 3769.349007
batchsize: 32
Forward throughput:   resnet18 : 1097.269584
batchsize: 32
Forward throughput:   resnet50 : 1088.810844
batchsize: 32
Forward throughput:      vgg16 : 209.317888
batchsize: 32
Forward throughput: squeezenet : 984.347351
Linux Server 1


GPU: GeForce GTX 1080 ti
CPU: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz

CPU

$ CUDA_VISIBLE_DEVICES=-1 python pytorch-benchmark.py
running on CPU
batchsize: 32
Forward throughput:    alexnet : 184.523443
batchsize: 32
Forward throughput:   resnet18 : 57.881935
batchsize: 32
Forward throughput:   resnet50 : 62.913347
batchsize: 32
Forward throughput:      vgg16 : 9.876533
batchsize: 32
Forward throughput: squeezenet : 75.650274

GPU

$ python pytorch-benchmark.py
gpu available
batchsize: 32
Forward throughput:    alexnet : 6767.500603
batchsize: 32
Forward throughput:   resnet18 : 2036.754280
batchsize: 32
Forward throughput:   resnet50 : 2017.909513
batchsize: 32
Forward throughput:      vgg16 : 395.942114
batchsize: 32
Forward throughput: squeezenet : 1731.326387

Linux Server 2


CPU: AMD Ryzen Threadripper 2920X 12-Core Processor
GPU: NVIDIA GeForce RTX 2080 Ti

CPU

$ CUDA_VISIBLE_DEVICES=-1 python pytorch-benchmark.py
running on CPU
batchsize: 32
Forward throughput:    alexnet : 91.100886
batchsize: 32
Forward throughput:   resnet18 : 68.885591
batchsize: 32
Forward throughput:   resnet50 : 76.730487
batchsize: 32
Forward throughput:      vgg16 : 12.766969
batchsize: 32
Forward throughput: squeezenet : 83.295199

GPU

$ python pytorch-benchmark.py
gpu available
batchsize: 32
Forward throughput:    alexnet : 10234.365602
batchsize: 32
Forward throughput:   resnet18 : 2449.743998
batchsize: 32
Forward throughput:   resnet50 : 2444.856202
batchsize: 32
Forward throughput:      vgg16 : 601.530237
batchsize: 32
Forward throughput: squeezenet : 2665.512214


## pytorch-benchmark.py
import time
import torch
import torch.nn as nn
import torchvision.models as models
from torch.autograd import Variable

def tput(model, name):
    print('batchsize: 32')
    input = torch.rand(32,3,224,224)

    if torch.cuda.is_available():
        model = model.cuda()
        input = input.cuda()
        n = 1000
    else:
        n = 50

    model(input)
    T = 0
    for _ in range(n):
        t1 = time.time()
        model(input)
        t2 = time.time()
        T += (t2-t1)
    T /= n
    print('Forward throughput: %10s : %f' % (name, 32/T))

if __name__ == '__main__':
    if torch.cuda.is_available():
        print('gpu available')
    else:
        print('running on CPU')
    alexnet = models.alexnet()
    resnet18 = models.resnet18()
    resnet50 = models.resnet50()
    vgg16 = models.vgg16()
    squeezenet = models.squeezenet1_0()
    tput(alexnet, 'alexnet')
    tput(resnet18, 'resnet18')
    tput(resnet18, 'resnet50')
    tput(vgg16, 'vgg16')
    tput(squeezenet, 'squeezenet')
	import time
	import torch
	import torch.nn as nn
	import torchvision.models as models
	from torch.autograd import Variable

	def tput(model, name):
	print('batchsize: 32')
	input = torch.rand(32,3,224,224)

	if torch.cuda.is_available():
	model = model.cuda()
	input = input.cuda()
	n = 1000
	else:
	n = 50

	model(input)
	T = 0
	for _ in range(n):
	t1 = time.time()
	model(input)
	t2 = time.time()
	T += (t2-t1)
	T /= n
	print('Forward throughput: %10s : %f' % (name, 32/T))

	if __name__ == '__main__':
	if torch.cuda.is_available():
	print('gpu available')
	else:
	print('running on CPU')
	alexnet = models.alexnet()
	resnet18 = models.resnet18()
	resnet50 = models.resnet50()
	vgg16 = models.vgg16()
	squeezenet = models.squeezenet1_0()
	tput(alexnet, 'alexnet')
	tput(resnet18, 'resnet18')
	tput(resnet18, 'resnet50')
	tput(vgg16, 'vgg16')
	tput(squeezenet, 'squeezenet')