Last active September 8, 2021 01:52
how to install torchserve and get your first model running

Installing TorchServe

Machine Type

GCP, Ubuntu instance (Canonical, Ubuntu, 16.04 LTS, amd64 xenial image built on 2020-06-10, supports Shielded VM features)

To get this example to actually work, I followed the official documentation, this blog post by AWS, and this YouTube demo

  1. Install Java 11
sudo add-apt-repository ppa:openjdk-r/ppa
sudo apt-get update
sudo apt-get install openjdk-11-jdk

1.1 Install Python3.7

sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt unpdate
sudo apt install python3.7 python3.7-dev

1.2 Install PIP

sudo apt install python-pip python3-venv python3-pip
pip install --upgrade pip

1.3 Install the right version of CUDA

this is no small feat, the right version of CUDA, nvidia driver, and PyTorch must be aligned.

  1. Instal VirtualEnvWrapper
pip install virtualenvwrapper

if you run into pip install issue this might help
if you have trouble locating this might help
do a pip check also to make sure there are no other missing packages
and lastly, make virtualenv available in Python3.7 by adding this alias to your ~/.bashrc:

alias mkvirtualenv3='mkvirtualenv --python=`which python3.7` '
  1. create a torchserve3 environment and install torchserve and torch-model-archiver
mkvirtualenv3 torchserve3
pip install torch torchtext torchvision sentencepiece psutil future
pip install torchserve torch-model-archiver

Now torchserve is availabe in your virtualenv torchserve3

Check that GPU is availabe by:

python -m torch.utils.collect_env

if you need to uninstall the wrong verison of cuda, see here and here

Extra Notes on PyTorch & CUDA
  • Cuda10.0 only works with torch==1.2 and torchvision==0.4.0 but TorchServe requires torch>=1.5
  • For our specific machine, we need driver nvidia-418, cuda-10.1, and this torch and torchvision install:
pip install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f

Start TorchServe

torch serve needs a model_store directory where archived models *.mar will be served. Mine is at ~/models/model_store/.

Start TorchServe (do this in a screen by:

torchserve --start --model-store ~/models/model_store/

Configuring a Public API

You need to first enable SSL
assuming that you are using the keystore method, you need to create a file with the following:


then start TorchServe (in the same path as your keystore.p12 and the following:

torchserve --start --model-store ~/models/model_store/ --ts-config

Archiving a Model

first clone the TorchServe repo to get access to the example model-file and extra-files:

git clone
  1. Download a trained model into your model_store directory (mine is ~/models/model_store)
wget -P ~/models/model_store
  1. Archive the model (run this in the parent directory of where your TorchServe repo directory sits)
torch-model-archiver --model-name densenet161 \
--version 1.0 --model-file serve/examples/image_classifier/densenet_161/ \
--serialized-file ~/models/model_store/densenet161-8d451a50.pth \
--extra-files serve/examples/image_classifier/index_to_name.json \
--handler image_classifier
  1. Move the archived model into model_store
mv densenet161.mar ~/models/model_store/

Optionally you can just host the model directly here (ideally do this in a screen)

torchserve --start --model-store ~/models/model_store/ --models densenet161=densenet161.mar

to register our DenseNet161 model in ~/models/model_store/:

curl -X POST "http://localhost:8081/models?url=densenet161.mar"

to configure workers, number of gpu, timeout, etc... see here
We will add gpu to our densenet161 model here:

curl -v -X PUT "http://localhost:8081/models/densenet161?min_worker=8&number_gpu=2&synchronous=true"

using the batch_size (max batch size the model expect to handle) and max_batch_delay(milliseconds to wait to fill-up batch) flags in the management API we could enable batch inference like so:

# set batch size to 8 and max delay to 50ms for the model densenet161
curl -X POST "localhost:8081/models?url=densenet161.mar&batch_size=8&max_batch_delay=50"

See Running Models

curl "http://localhost:8081/models"

To see details of models running, for example our DenseNet161:

curl "http://localhost:8081/models/densenet161"

To simply see the health of torchserve:

curl http://localhost:8080/ping

Make Inference Request

  1. Download test image
curl -O
  1. Send Image to Inference API
curl -X POST -T kitten.jpg


