Skip to content

Instantly share code, notes, and snippets.

@mkolod
Created September 14, 2018 21:32
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mkolod/4b75080a470afb09ff588a334aad2a40 to your computer and use it in GitHub Desktop.
Save mkolod/4b75080a470afb09ff588a334aad2a40 to your computer and use it in GitHub Desktop.
import mxnet as mx
from mxnet.gluon.model_zoo import vision
import os
import time
batch_shape = (1, 3, 224, 224)
resnet18 = vision.resnet18_v2(pretrained=True)
resnet18.hybridize()
resnet18.forward(mx.nd.zeros(batch_shape))
resnet18.export('resnet18_v2')
sym, arg_params, aux_params = mx.model.load_checkpoint('resnet18_v2', 0)
# Create sample input
input = mx.nd.zeros(batch_shape)
# Execute with MXNet
os.environ['MXNET_USE_TENSORRT'] = '0'
executor = sym.simple_bind(ctx=mx.gpu(0), data=batch_shape, grad_req='null', force_rebind=True)
executor.copy_params_from(arg_params, aux_params)
# Warmup
print('Warming up MXNet')
for i in range(0, 10):
y_gen = executor.forward(is_train=False, data=input)
y_gen[0].wait_to_read()
# Timing
print('Starting MXNet timed run')
start = time.process_time()
for i in range(0, 10000):
y_gen = executor.forward(is_train=False, data=input)
y_gen[0].wait_to_read()
end = time.time()
print(time.process_time() - start)
# Execute with TensorRT
print('Building TensorRT engine')
os.environ['MXNET_USE_TENSORRT'] = '1'
arg_params.update(aux_params)
all_params = dict([(k, v.as_in_context(mx.gpu(0))) for k, v in arg_params.items()])
executor = mx.contrib.tensorrt.tensorrt_bind(sym, ctx=mx.gpu(0), all_params=all_params,
data=batch_shape, grad_req='null', force_rebind=True)
#Warmup
print('Warming up TensorRT')
for i in range(0, 10):
y_gen = executor.forward(is_train=False, data=input)
y_gen[0].wait_to_read()
# Timing
print('Starting TensorRT timed run')
start = time.process_time()
for i in range(0, 10000):
y_gen = executor.forward(is_train=False, data=input)
y_gen[0].wait_to_read()
end = time.time()
print(time.process_time() - start)
@mkolod
Copy link
Author

mkolod commented Sep 14, 2018

Taken from here and tested.

Output on a 6-core Skylake CPU (Intel Core i7-7800X CPU @ 3.50GHz), with an NVIDIA Titan V:

[21:29:29] src/c_api/c_api_executor.cc:464: TensorRT not enabled by default.  Please set the MXNET_USE_TENSORRT environment variable to 1 or call mx.contrib.tensorrt.set_use_tensorrt(True) to enable.
Warming up MXNet
[21:29:35] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:109: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
Starting MXNet timed run
31.318571072999998
Building TensorRT engine
Warming up TensorRT
Starting TensorRT timed run
19.153778316

The above timing is in seconds. The test was done using a Docker image

docker pull mxnet/tensorrt

and run using nvidia-docker. To install nvidia-docker, check here for instructions.

MXNet-TensorRT can also be installed using pip packages made for CUDA 9.0 and 9.2:

pip install mxnet-tensorrt-cu90

or

pip install mxnet-tensorrt-cu92

For the pip installation, make sure you already installed TensorRT and OpenBLAS. For OpenBLAS, on Ubuntu 16.04, do

sudo apt-get install libopenblas-base

Similar things can be done on other versions of Ubuntu, Debian, RHEL, CentOS, etc.
For TensorRT, you can download it here and follow the instructions on that page, or here.

As you can see, the Docker image approach is the easiest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment