jakelevi1996/.Single script example of using TensorRT and TensorFlow.md

## .Single script example of using TensorRT and TensorFlow.md

      
    Raw
  

              .Single script example of using TensorRT and TensorFlow.md
            
          
    Single script example of using TensorRT and TensorFlow

This Gist demonstrates a self-contained, single-script example of how to define a simple Keras CNN model, train it on MNIST, convert to TensorRT format, and then perform GPU inference on the Jetson Nano. The script is shown below, followed by the console output.
TODO: fix errors; maybe ask question on Stack Overflow and raise issue on Nvidia website
NB, according to this answer on Stack Overflow, the error messages can be removed by limiting the GPU memory to 2GB (after importing Tensorflow but before importing anything else) as follows:
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    tf.config.experimental.set_virtual_device_configuration(
        gpu,
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=2048)]
    )
from tensorflow.keras import datasets
According to this answer on the Nvidia developers' forum, for better performance, it is recommended to use pure TensorRT rather than TF-TRT, and to serialize the TensorRT engine for the next time it is used.
Below the conversion script is an analogous scrpit that only contains the parts for loading the already converted model and performing inference, along with the console output. Following this is a slightly updated inference script, with a GPU memory limit, and slightly more detailed timing information.

  
## trt_convert.py
"""
Example script which trains a simple CNN for 1 epoch on a subset of MNIST, and
converts the model to TensorRT format, for enhanced performance which fully
utilises the NVIDIA GPU, and then performs inference.

Useful resources:
- https://stackoverflow.com/questions/58846828/how-to-convert-tensorflow-2-0-savedmodel-to-tensorrt
- https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#worflow-with-savedmodel
- https://www.tensorflow.org/api_docs/python/tf/experimental/tensorrt/Converter
- https://github.com/tensorflow/tensorflow/issues/34339
- https://github.com/tensorflow/tensorrt/blob/master/tftrt/examples/image-classification/image_classification.py

Tested on the NVIDIA Jetson Nano, Python 3.6.9, tensorflow 2.1.0+nv20.4, numpy
1.16.1
"""
import os
from time import perf_counter
import numpy as np

t0 = perf_counter()

import tensorflow as tf
from tensorflow.keras import datasets, layers, models, Input
from tensorflow.python.compiler.tensorrt import trt_convert as trt
from tensorflow.python.saved_model import signature_constants
from tensorflow.python.saved_model import tag_constants
from tensorflow.python.framework import convert_to_constants
tf.compat.v1.enable_eager_execution() # see github issue above

# Get training and test data
(x_train, y_train), (x_test, y_test) = datasets.mnist.load_data()
x_train = np.expand_dims(x_train, -1) / 255.0
x_test = np.expand_dims(x_test, -1) / 255.0

# Create model
model = models.Sequential()
# model.add(Input(shape=x_train.shape[1:], batch_size=batch_size))
model.add(layers.Conv2D(10, (5, 5), activation='relu', padding="same"))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(10))

# Compile and train model
model.compile(optimizer='adam',
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=['accuracy'])

model.fit(
    x_train[:10000], y_train[:10000], validation_data=(x_test, y_test),
    batch_size=100, epochs=1,
)

# Save model
print("Saving model...")
current_dir = os.path.dirname(os.path.abspath(__file__))
model_dir = os.path.join(current_dir, "CNN_MNIST")
if not os.path.isdir(model_dir): os.makedirs(model_dir)
# model.save(model_dir)
tf.saved_model.save(model, model_dir)


# Convert to TRT format
trt_model_dir = os.path.join(current_dir, "CNN_MNIST_TRT")
converter = trt.TrtGraphConverterV2(input_saved_model_dir=model_dir)
converter.convert()
converter.save(trt_model_dir)

t1 = perf_counter()
print("Finished TRT conversion; time taken = {:.3f} s".format(t1 - t0))


# Make predictions using saved model, and print the results (NB using an alias
# for tf.saved_model.load, because the normal way of calling this function
# throws an error because for some reason it is expecting a sess)
saved_model_loaded = tf.compat.v1.saved_model.load_v2(
    export_dir=trt_model_dir, tags=[tag_constants.SERVING])
graph_func = saved_model_loaded.signatures[
    signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
graph_func = convert_to_constants.convert_variables_to_constants_v2(graph_func)
x_test_tensor = tf.convert_to_tensor(x_test, dtype=tf.float32)
preds = graph_func(x_test_tensor)[0].numpy()
print(preds.shape, y_test.shape)
accuracy = list(preds.argmax(axis=1) == y_test).count(True) / y_test.size
print("Accuracy of predictions = {:.2f} %".format(accuracy * 100))

## z1_console_output.txt
2020-06-23 23:05:36.712805: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-06-23 23:05:42.554064: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer.so.7
2020-06-23 23:05:42.608912: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer_plugin.so.7
2020-06-23 23:05:50.169213: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-06-23 23:05:50.233714: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-23 23:05:50.233874: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.87GiB deviceMemoryBandwidth: 194.55MiB/s
2020-06-23 23:05:50.233939: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-06-23 23:05:50.234048: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-06-23 23:05:50.322334: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-06-23 23:05:50.457012: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-06-23 23:05:50.650414: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-06-23 23:05:50.749994: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-06-23 23:05:50.750325: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-06-23 23:05:50.750780: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-23 23:05:50.751280: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-23 23:05:50.751462: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-06-23 23:05:50.776428: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2020-06-23 23:05:50.777026: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x386a72d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-23 23:05:50.777074: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-06-23 23:05:50.911666: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-23 23:05:50.911975: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3860ce20 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-06-23 23:05:50.912024: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3
2020-06-23 23:05:50.912473: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-23 23:05:50.912616: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.87GiB deviceMemoryBandwidth: 194.55MiB/s
2020-06-23 23:05:50.912697: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-06-23 23:05:50.912759: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-06-23 23:05:50.912838: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-06-23 23:05:50.912917: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-06-23 23:05:50.912990: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-06-23 23:05:50.913055: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-06-23 23:05:50.913112: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-06-23 23:05:50.913330: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-23 23:05:50.913611: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-23 23:05:50.913735: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-06-23 23:05:50.914111: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-06-23 23:06:03.357813: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-23 23:06:03.357898: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0
2020-06-23 23:06:03.357924: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N
2020-06-23 23:06:03.358306: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-23 23:06:03.358546: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-23 23:06:03.358737: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 638 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
2020-06-23 23:06:06.468747: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-06-23 23:06:09.398461: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
Train on 10000 samples, validate on 10000 samples
10000/10000 [==============================] - 25s 2ms/sample - loss: 0.9010 - acc: 0.7658 - val_loss: 0.4164 - val_acc: 0.8848
2020-06-23 23:06:31.371115: W tensorflow/python/util/util.cc:319] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1786: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
2020-06-23 23:06:34.192400: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer.so.7
2020-06-23 23:06:36.422325: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-23 23:06:36.422486: I tensorflow/core/grappler/devices.cc:55] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2020-06-23 23:06:36.422773: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2020-06-23 23:06:36.463648: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-23 23:06:36.463934: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.87GiB deviceMemoryBandwidth: 194.55MiB/s
2020-06-23 23:06:36.499520: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-06-23 23:06:36.499737: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-06-23 23:06:36.535623: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-06-23 23:06:36.535722: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-06-23 23:06:36.535772: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-06-23 23:06:36.550112: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-06-23 23:06:36.550174: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-06-23 23:06:36.550353: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-23 23:06:36.550542: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-23 23:06:36.550706: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-06-23 23:06:36.562853: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-23 23:06:36.562892: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0
2020-06-23 23:06:36.563415: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N
2020-06-23 23:06:36.563671: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-23 23:06:36.563918: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-23 23:06:36.564112: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 638 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
2020-06-23 23:06:36.605834: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841] Optimization results for grappler item: graph_to_optimize
2020-06-23 23:06:36.605902: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:843]   function_optimizer: Graph size after: 27 nodes (20), 44 edges (37), time = 3.574ms.
2020-06-23 23:06:36.605924: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:843]   function_optimizer: function_optimizer did nothing. time = 0.171ms.
2020-06-23 23:06:37.112704: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-23 23:06:37.112849: I tensorflow/core/grappler/devices.cc:55] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2020-06-23 23:06:37.113030: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2020-06-23 23:06:37.113963: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-23 23:06:37.114088: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.87GiB deviceMemoryBandwidth: 194.55MiB/s
2020-06-23 23:06:37.114149: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-06-23 23:06:37.114191: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-06-23 23:06:37.114261: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-06-23 23:06:37.114310: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-06-23 23:06:37.114350: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-06-23 23:06:37.114388: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-06-23 23:06:37.114451: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-06-23 23:06:37.114597: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-23 23:06:37.114759: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-23 23:06:37.114822: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-06-23 23:06:37.114878: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-23 23:06:37.114907: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0
2020-06-23 23:06:37.114957: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N
2020-06-23 23:06:37.115112: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-23 23:06:37.115295: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-23 23:06:37.115386: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 638 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
2020-06-23 23:06:37.162831: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:460] There are 6 ops of 3 different types in the graph that are not converted to TensorRT: Identity, NoOp, Placeholder, (For more information see https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#supported-ops).
2020-06-23 23:06:37.163253: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:647] Number of TensorRT candidate segments: 1
2020-06-23 23:06:37.165049: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:748] Replaced segment 0 consisting of 19 nodes by TRTEngineOp_0.
2020-06-23 23:06:37.193543: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841] Optimization results for grappler item: tf_graph
2020-06-23 23:06:37.193616: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:843]   constant_folding: Graph size after: 23 nodes (-4), 36 edges (-8), time = 5.55ms.
2020-06-23 23:06:37.193638: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:843]   layout: Graph size after: 27 nodes (4), 40 edges (4), time = 4.354ms.
2020-06-23 23:06:37.193655: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:843]   constant_folding: Graph size after: 27 nodes (0), 40 edges (0), time = 2.925ms.
2020-06-23 23:06:37.193672: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:843]   TensorRTOptimizer: Graph size after: 9 nodes (-18), 9 edges (-31), time = 5.309ms.
2020-06-23 23:06:37.193688: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:843]   constant_folding: Graph size after: 9 nodes (0), 9 edges (0), time = 1.451ms.
2020-06-23 23:06:37.193705: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841] Optimization results for grappler item: TRTEngineOp_0_native_segment
2020-06-23 23:06:37.193721: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:843]   constant_folding: Graph size after: 21 nodes (0), 24 edges (0), time = 2.453ms.
2020-06-23 23:06:37.193738: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:843]   layout: Graph size after: 21 nodes (0), 24 edges (0), time = 2.312ms.
2020-06-23 23:06:37.193756: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:843]   constant_folding: Graph size after: 21 nodes (0), 24 edges (0), time = 2.316ms.
2020-06-23 23:06:37.193773: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:843]   TensorRTOptimizer: Graph size after: 21 nodes (0), 24 edges (0), time = 0.187ms.
2020-06-23 23:06:37.193790: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:843]   constant_folding: Graph size after: 21 nodes (0), 24 edges (0), time = 2.3ms.
2020-06-23 23:06:38.634755: W tensorflow/core/framework/op_kernel.cc:1655] OP_REQUIRES failed at trt_engine_resource_ops.cc:183 : Not found: Container TF-TRT does not exist. (Could not find resource: TF-TRT/TRTEngineOp_0)
2020-06-23 23:06:44.595931: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-23 23:06:44.596081: I tensorflow/core/grappler/devices.cc:55] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2020-06-23 23:06:44.596299: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2020-06-23 23:06:44.597546: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-23 23:06:44.598133: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.87GiB deviceMemoryBandwidth: 194.55MiB/s
2020-06-23 23:06:44.598415: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-06-23 23:06:44.598643: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-06-23 23:06:44.598942: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-06-23 23:06:44.599207: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-06-23 23:06:44.599466: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-06-23 23:06:44.599723: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-06-23 23:06:44.599928: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-06-23 23:06:44.600866: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-23 23:06:44.601526: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-23 23:06:44.601823: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-06-23 23:06:44.602043: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-23 23:06:44.602177: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0
2020-06-23 23:06:44.602265: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N
2020-06-23 23:06:44.602930: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-23 23:06:44.603758: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-23 23:06:44.604283: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 638 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
2020-06-23 23:06:44.695773: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841] Optimization results for grappler item: graph_to_optimize
2020-06-23 23:06:44.695856: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:843]   function_optimizer: Graph size after: 12 nodes (9), 12 edges (10), time = 3.697ms.
2020-06-23 23:06:44.695879: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:843]   function_optimizer: Graph size after: 12 nodes (0), 12 edges (0), time = 2.078ms.
2020-06-23 23:06:44.695896: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841] Optimization results for grappler item: __inference_TRTEngineOp_0_native_segment_2773
2020-06-23 23:06:44.695914: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:843]   function_optimizer: function_optimizer did nothing. time = 0.003ms.
2020-06-23 23:06:44.695930: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:843]   function_optimizer: function_optimizer did nothing. time = 0ms.
2020-06-23 23:06:45.362080: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for PartitionedCall/TRTEngineOp_0 with input shapes: [[10000,28,28,1]]
2020-06-23 23:06:45.362221: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer.so.7
2020-06-23 23:06:45.440944: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer_plugin.so.7
2020-06-23 23:06:55.562779: W tensorflow/core/common_runtime/bfc_allocator.cc:424] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.00GiB (rounded to 1073742336).  Current allocation summary follows.
2020-06-23 23:06:55.563487: I tensorflow/core/common_runtime/bfc_allocator.cc:894] BFCAllocator dump for GPU_0_bfc
2020-06-23 23:06:55.563920: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (256): Total Chunks: 48, Chunks in use: 47. 12.0KiB allocated for chunks. 11.8KiB in use in bin. 864B client-requested in use in bin.
2020-06-23 23:06:55.564669: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (512): Total Chunks: 2, Chunks in use: 1. 1.2KiB allocated for chunks. 512B in use in bin. 400B client-requested in use in bin.
2020-06-23 23:06:55.565080: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (1024): Total Chunks: 10, Chunks in use: 10. 10.2KiB allocated for chunks. 10.2KiB in use in bin. 9.8KiB client-requested in use in bin.
2020-06-23 23:06:55.565391: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (2048): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-23 23:06:55.565686: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (4096): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-23 23:06:55.565956: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (8192): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-23 23:06:55.566232: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (16384): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-23 23:06:55.566504: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (32768): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-23 23:06:55.566839: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (65536): Total Chunks: 9, Chunks in use: 7. 684.5KiB allocated for chunks. 537.2KiB in use in bin. 535.9KiB client-requested in use in bin.
2020-06-23 23:06:55.567180: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (131072): Total Chunks: 2, Chunks in use: 2. 302.8KiB allocated for chunks. 302.8KiB in use in bin. 153.1KiB client-requested in use in bin.
2020-06-23 23:06:55.567477: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (262144): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-23 23:06:55.567748: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (524288): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-23 23:06:55.568016: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (1048576): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-23 23:06:55.568888: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (2097152): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-23 23:06:55.569009: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (4194304): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-23 23:06:55.569125: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (8388608): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-23 23:06:55.569270: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (16777216): Total Chunks: 1, Chunks in use: 1. 29.91MiB allocated for chunks. 29.91MiB in use in bin. 29.91MiB client-requested in use in bin.
2020-06-23 23:06:55.569405: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (33554432): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-23 23:06:55.569533: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (67108864): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-23 23:06:55.569673: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (134217728): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-23 23:06:55.569832: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (268435456): Total Chunks: 1, Chunks in use: 0. 607.39MiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-23 23:06:55.569988: I tensorflow/core/common_runtime/bfc_allocator.cc:917] Bin for 1.00GiB was 256.00MiB, Chunk State:
2020-06-23 23:06:55.570195: I tensorflow/core/common_runtime/bfc_allocator.cc:923]   Size: 607.39MiB | Requested Size: 0B | in_use: 0 | bin_num: 20, prev:   Size: 29.91MiB | Requested Size: 29.91MiB | in_use: 1 | bin_num: -1
2020-06-23 23:06:55.570327: I tensorflow/core/common_runtime/bfc_allocator.cc:930] Next region of size 669294592
2020-06-23 23:06:55.570481: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00e90000 of size 1280 next 1
2020-06-23 23:06:55.570619: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00e90500 of size 256 next 2
2020-06-23 23:06:55.570745: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00e90600 of size 256 next 5
2020-06-23 23:06:55.570871: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00e90700 of size 256 next 4
2020-06-23 23:06:55.570994: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00e90800 of size 256 next 8
2020-06-23 23:06:55.571120: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00e90900 of size 256 next 10
2020-06-23 23:06:55.571247: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00e90a00 of size 256 next 11
2020-06-23 23:06:55.571369: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00e90b00 of size 256 next 12
2020-06-23 23:06:55.571492: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00e90c00 of size 256 next 13
2020-06-23 23:06:55.571615: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00e90d00 of size 256 next 14
2020-06-23 23:06:55.571740: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00e90e00 of size 256 next 15
2020-06-23 23:06:55.571861: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00e90f00 of size 256 next 3
2020-06-23 23:06:55.571998: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00e91000 of size 1024 next 6
2020-06-23 23:06:55.572127: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00e91400 of size 1024 next 16
2020-06-23 23:06:55.572618: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00e91800 of size 156160 next 9
2020-06-23 23:06:55.572693: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00eb7a00 of size 78592 next 7
2020-06-23 23:06:55.572759: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ecad00 of size 256 next 17
2020-06-23 23:06:55.572827: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ecae00 of size 1024 next 18
2020-06-23 23:06:55.572896: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ecb200 of size 256 next 19
2020-06-23 23:06:55.572969: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ecb300 of size 78592 next 20
2020-06-23 23:06:55.573047: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ede600 of size 256 next 21
2020-06-23 23:06:55.573126: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ede700 of size 256 next 22
2020-06-23 23:06:55.573207: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ede800 of size 512 next 23
2020-06-23 23:06:55.573284: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00edea00 of size 256 next 24
2020-06-23 23:06:55.573366: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00edeb00 of size 256 next 25
2020-06-23 23:06:55.573450: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00edec00 of size 256 next 26
2020-06-23 23:06:55.573533: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00eded00 of size 256 next 38
2020-06-23 23:06:55.573623: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00edee00 of size 256 next 36
2020-06-23 23:06:55.573706: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00edef00 of size 256 next 28
2020-06-23 23:06:55.573786: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00edf000 of size 256 next 51
2020-06-23 23:06:55.573869: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00edf100 of size 256 next 41
2020-06-23 23:06:55.573952: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00edf200 of size 256 next 33
2020-06-23 23:06:55.574034: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00edf300 of size 256 next 70
2020-06-23 23:06:55.574115: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00edf400 of size 256 next 27
2020-06-23 23:06:55.574197: I tensorflow/core/common_runtime/bfc_allocator.cc:950] Free  at f00edf500 of size 768 next 53
2020-06-23 23:06:55.574280: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00edf800 of size 256 next 59
2020-06-23 23:06:55.574361: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00edf900 of size 256 next 60
2020-06-23 23:06:55.574443: I tensorflow/core/common_runtime/bfc_allocator.cc:950] Free  at f00edfa00 of size 256 next 55
2020-06-23 23:06:55.574526: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00edfb00 of size 256 next 61
2020-06-23 23:06:55.574608: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00edfc00 of size 256 next 62
2020-06-23 23:06:55.574691: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00edfd00 of size 256 next 57
2020-06-23 23:06:55.574773: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00edfe00 of size 1024 next 58
2020-06-23 23:06:55.574855: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ee0200 of size 1024 next 63
2020-06-23 23:06:55.574934: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ee0600 of size 256 next 64
2020-06-23 23:06:55.575016: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ee0700 of size 256 next 65
2020-06-23 23:06:55.575099: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ee0800 of size 256 next 66
2020-06-23 23:06:55.575180: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ee0900 of size 256 next 67
2020-06-23 23:06:55.575262: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ee0a00 of size 256 next 68
2020-06-23 23:06:55.575345: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ee0b00 of size 1024 next 69
2020-06-23 23:06:55.575426: I tensorflow/core/common_runtime/bfc_allocator.cc:950] Free  at f00ee0f00 of size 72192 next 30
2020-06-23 23:06:55.575509: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ef2900 of size 256 next 40
2020-06-23 23:06:55.575592: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ef2a00 of size 256 next 34
2020-06-23 23:06:55.575675: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ef2b00 of size 256 next 39
2020-06-23 23:06:55.575755: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ef2c00 of size 256 next 37
2020-06-23 23:06:55.575838: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ef2d00 of size 256 next 29
2020-06-23 23:06:55.575920: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ef2e00 of size 1024 next 42
2020-06-23 23:06:55.576000: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ef3200 of size 256 next 43
2020-06-23 23:06:55.576082: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ef3300 of size 256 next 45
2020-06-23 23:06:55.576472: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ef3400 of size 256 next 46
2020-06-23 23:06:55.576702: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ef3500 of size 256 next 47
2020-06-23 23:06:55.576770: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ef3600 of size 256 next 48
2020-06-23 23:06:55.576824: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ef3700 of size 256 next 49
2020-06-23 23:06:55.576879: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ef3800 of size 1024 next 50
2020-06-23 23:06:55.577084: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ef3c00 of size 1024 next 31
2020-06-23 23:06:55.577237: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ef4000 of size 153856 next 32
2020-06-23 23:06:55.577332: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00f19900 of size 78592 next 44
2020-06-23 23:06:55.578244: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00f2cc00 of size 78592 next 35
2020-06-23 23:06:55.578343: I tensorflow/core/common_runtime/bfc_allocator.cc:950] Free  at f00f3ff00 of size 78592 next 52
2020-06-23 23:06:55.578413: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00f53200 of size 78592 next 56
2020-06-23 23:06:55.578477: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00f66500 of size 78592 next 54
2020-06-23 23:06:55.578540: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00f79800 of size 78592 next 71
2020-06-23 23:06:55.578605: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00f8cb00 of size 31360000 next 72
2020-06-23 23:06:55.578670: I tensorflow/core/common_runtime/bfc_allocator.cc:950] Free  at f02d74f00 of size 636899584 next 18446744073709551615
2020-06-23 23:06:55.578729: I tensorflow/core/common_runtime/bfc_allocator.cc:955]      Summary of in-use Chunks by size:
2020-06-23 23:06:55.578810: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 47 Chunks of size 256 totalling 11.8KiB
2020-06-23 23:06:55.578881: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 1 Chunks of size 512 totalling 512B
2020-06-23 23:06:55.578949: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 9 Chunks of size 1024 totalling 9.0KiB
2020-06-23 23:06:55.579019: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 1 Chunks of size 1280 totalling 1.2KiB
2020-06-23 23:06:55.579093: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 7 Chunks of size 78592 totalling 537.2KiB
2020-06-23 23:06:55.579167: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 1 Chunks of size 153856 totalling 150.2KiB
2020-06-23 23:06:55.579239: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 1 Chunks of size 156160 totalling 152.5KiB
2020-06-23 23:06:55.579313: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 1 Chunks of size 31360000 totalling 29.91MiB
2020-06-23 23:06:55.579384: I tensorflow/core/common_runtime/bfc_allocator.cc:962] Sum Total of in-use chunks: 30.75MiB
2020-06-23 23:06:55.579448: I tensorflow/core/common_runtime/bfc_allocator.cc:964] total_region_allocated_bytes_: 669294592 memory_limit_: 669294592 available bytes: 0 curr_region_allocation_bytes_: 1338589184
2020-06-23 23:06:55.579539: I tensorflow/core/common_runtime/bfc_allocator.cc:970] Stats:
Limit:                   669294592
InUse:                    32243200
MaxInUse:                 53798144
NumAllocs:                    5871
MaxAllocSize:             31360000

2020-06-23 23:06:55.579628: W tensorflow/core/common_runtime/bfc_allocator.cc:429] *****_______________________________________________________________________________________________
2020-06-23 23:06:55.579734: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger Requested amount of GPU memory (1073741824 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
2020-06-23 23:06:55.579827: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger /home/jenkins/workspace/TensorRT/helpers/rel-7.1/L1_Nightly_Internal/build/source/rtSafe/resources.h (181) - OutOfMemory Error in GpuMemory: 0
2020-06-23 23:06:55.835307: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger /home/jenkins/workspace/TensorRT/helpers/rel-7.1/L1_Nightly_Internal/build/source/rtSafe/resources.h (181) - OutOfMemory Error in GpuMemory: 0
2020-06-23 23:06:55.835502: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:751] Engine creation for PartitionedCall/TRTEngineOp_0 failed. The native segment will be used instead. Reason: Internal: Failed to build TensorRT engine
2020-06-23 23:06:55.939306: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 315.07MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-06-23 23:06:55.939378: W tensorflow/core/kernels/gpu_utils.cc:49] Failed to allocate memory for convolution redzone checking; skipping this check. This is benign and only means that we won't check cudnn for out-of-bounds reads and writes. This message will only be printed once.
2020-06-23 23:06:57.479074: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 763.68MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-06-23 23:06:57.479292: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 846.08MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-06-23 23:06:57.479440: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 472.58MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-06-23 23:06:57.485852: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.12GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
Saving model...
Finished TRT conversion; time taken = 66.443 s
(10000, 10) (10000,)
Accuracy of predictions = 88.48 %

## z2_trt_infer.py
"""
Example script which trains a simple CNN for 1 epoch on a subset of MNIST, and
converts the model to TensorRT format, for enhanced performance which fully
utilises the NVIDIA GPU.

Useful resources:
- https://stackoverflow.com/questions/58846828/how-to-convert-tensorflow-2-0-savedmodel-to-tensorrt
- https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#worflow-with-savedmodel
- https://www.tensorflow.org/api_docs/python/tf/experimental/tensorrt/Converter
- https://github.com/tensorflow/tensorflow/issues/34339
- https://github.com/tensorflow/tensorrt/blob/master/tftrt/examples/image-classification/image_classification.py

Tested on the NVIDIA Jetson Nano, Python 3.6.9, tensorflow 2.1.0+nv20.4, numpy
1.16.1
"""
import os
from time import perf_counter
import numpy as np

t0 = perf_counter()

import tensorflow as tf
from tensorflow.keras import datasets
from tensorflow.python.saved_model import signature_constants
from tensorflow.python.saved_model import tag_constants
from tensorflow.python.framework import convert_to_constants
tf.compat.v1.enable_eager_execution() # see github issue above

# Get training and test data
(x_train, y_train), (x_test, y_test) = datasets.mnist.load_data()
x_train = np.expand_dims(x_train, -1) / 255.0
x_test = np.expand_dims(x_test, -1) / 255.0

# TEMPORARY: just use 100 test points to minimise GPU memory
num_points = 100
x_test, y_test = x_test[:num_points], y_test[:num_points]

current_dir = os.path.dirname(os.path.abspath(__file__))
trt_model_dir = os.path.join(current_dir, "CNN_MNIST_TRT")
# Make predictions using saved model, and print the results (NB using an alias
# for tf.saved_model.load, because the normal way of calling this function
# throws an error because for some reason it is expecting a sess)
saved_model_loaded = tf.compat.v1.saved_model.load_v2(
    export_dir=trt_model_dir, tags=[tag_constants.SERVING])
graph_func = saved_model_loaded.signatures[
    signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
graph_func = convert_to_constants.convert_variables_to_constants_v2(graph_func)
x_test_tensor = tf.convert_to_tensor(x_test, dtype=tf.float32)
preds = graph_func(x_test_tensor)[0].numpy()
print(preds.shape, y_test.shape)
accuracy = list(preds.argmax(axis=1) == y_test).count(True) / y_test.size
print("Accuracy of predictions = {:.2f} %".format(accuracy * 100))

t1 = perf_counter()
print("Finished inference; time taken = {:.3f} s".format(t1 - t0))

## z3_trt_infer_console_output.txt
2020-06-24 15:57:13.306180: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-06-24 15:57:17.265242: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer.so.7
2020-06-24 15:57:17.267596: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer_plugin.so.7
2020-06-24 15:57:24.338597: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-06-24 15:57:24.350070: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-24 15:57:24.350220: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.87GiB deviceMemoryBandwidth: 194.55MiB/s
2020-06-24 15:57:24.350275: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-06-24 15:57:24.350376: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-06-24 15:57:24.358709: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-06-24 15:57:24.359556: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-06-24 15:57:24.363972: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-06-24 15:57:24.367787: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-06-24 15:57:24.367958: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-06-24 15:57:24.368179: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-24 15:57:24.368397: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-24 15:57:24.368471: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-06-24 15:57:24.389728: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2020-06-24 15:57:24.390312: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2df0b990 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-24 15:57:24.390362: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-06-24 15:57:24.472417: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-24 15:57:24.472715: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2de71490 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-06-24 15:57:24.472760: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3
2020-06-24 15:57:24.473079: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-24 15:57:24.473188: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.87GiB deviceMemoryBandwidth: 194.55MiB/s
2020-06-24 15:57:24.473253: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-06-24 15:57:24.473298: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-06-24 15:57:24.473357: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-06-24 15:57:24.473485: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-06-24 15:57:24.473540: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-06-24 15:57:24.473581: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-06-24 15:57:24.473613: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-06-24 15:57:24.473750: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-24 15:57:24.473918: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-24 15:57:24.473986: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-06-24 15:57:24.474055: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-06-24 15:57:27.702016: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-24 15:57:27.702100: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0
2020-06-24 15:57:27.702127: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N
2020-06-24 15:57:27.702457: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-24 15:57:27.702700: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-24 15:57:27.702847: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 427 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
2020-06-24 15:57:30.874434: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-24 15:57:30.874586: I tensorflow/core/grappler/devices.cc:55] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2020-06-24 15:57:30.874785: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2020-06-24 15:57:30.876099: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-24 15:57:30.876252: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.87GiB deviceMemoryBandwidth: 194.55MiB/s
2020-06-24 15:57:30.876314: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-06-24 15:57:30.876368: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-06-24 15:57:30.876435: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-06-24 15:57:30.876484: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-06-24 15:57:30.876528: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-06-24 15:57:30.876569: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-06-24 15:57:30.876602: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-06-24 15:57:30.876772: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-24 15:57:30.876942: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-24 15:57:30.877008: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-06-24 15:57:30.877064: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-24 15:57:30.877095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0
2020-06-24 15:57:30.877114: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N
2020-06-24 15:57:30.877326: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-24 15:57:30.877669: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-06-24 15:57:30.877779: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 427 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
2020-06-24 15:57:30.969987: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841] Optimization results for grappler item: graph_to_optimize
2020-06-24 15:57:30.970066: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:843]   function_optimizer: Graph size after: 12 nodes (9), 12 edges (10), time = 4.114ms.
2020-06-24 15:57:30.970088: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:843]   function_optimizer: Graph size after: 12 nodes (0), 12 edges (0), time = 2.048ms.
2020-06-24 15:57:30.970107: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841] Optimization results for grappler item: __inference_TRTEngineOp_0_native_segment_23
2020-06-24 15:57:30.970125: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:843]   function_optimizer: function_optimizer did nothing. time = 0.003ms.
2020-06-24 15:57:30.970142: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:843]   function_optimizer: function_optimizer did nothing. time = 0.001ms.
2020-06-24 15:57:31.514423: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for PartitionedCall/TRTEngineOp_0 with input shapes: [[100,28,28,1]]
2020-06-24 15:57:31.514576: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer.so.7
2020-06-24 15:57:31.515111: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer_plugin.so.7
2020-06-24 15:57:43.582046: W tensorflow/core/common_runtime/bfc_allocator.cc:424] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.00GiB (rounded to 1073742336).  Current allocation summary follows.
2020-06-24 15:57:43.583771: I tensorflow/core/common_runtime/bfc_allocator.cc:894] BFCAllocator dump for GPU_0_bfc
2020-06-24 15:57:43.584210: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (256): Total Chunks: 15, Chunks in use: 14. 3.8KiB allocated for chunks. 3.5KiB in use in bin. 276B client-requested in use in bin.
2020-06-24 15:57:43.584547: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (512): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-24 15:57:43.584876: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (1024): Total Chunks: 4, Chunks in use: 4. 4.2KiB allocated for chunks. 4.2KiB in use in bin. 3.9KiB client-requested in use in bin.
2020-06-24 15:57:43.585184: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (2048): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-24 15:57:43.585710: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (4096): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-24 15:57:43.586058: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (8192): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-24 15:57:43.586349: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (16384): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-24 15:57:43.586626: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (32768): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-24 15:57:43.586984: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (65536): Total Chunks: 3, Chunks in use: 2. 231.0KiB allocated for chunks. 153.5KiB in use in bin. 153.1KiB client-requested in use in bin.
2020-06-24 15:57:43.587335: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (131072): Total Chunks: 1, Chunks in use: 1. 150.2KiB allocated for chunks. 150.2KiB in use in bin. 76.6KiB client-requested in use in bin.
2020-06-24 15:57:43.587676: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (262144): Total Chunks: 1, Chunks in use: 1. 306.2KiB allocated for chunks. 306.2KiB in use in bin. 306.2KiB client-requested in use in bin.
2020-06-24 15:57:43.587963: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (524288): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-24 15:57:43.588241: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (1048576): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-24 15:57:43.588514: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (2097152): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-24 15:57:43.588783: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (4194304): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-24 15:57:43.589057: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (8388608): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-24 15:57:43.589350: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (16777216): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-24 15:57:43.589882: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (33554432): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-24 15:57:43.590176: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (67108864): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-24 15:57:43.590460: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (134217728): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-24 15:57:43.590612: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (268435456): Total Chunks: 1, Chunks in use: 0. 426.68MiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-06-24 15:57:43.590723: I tensorflow/core/common_runtime/bfc_allocator.cc:917] Bin for 1.00GiB was 256.00MiB, Chunk State:
2020-06-24 15:57:43.590879: I tensorflow/core/common_runtime/bfc_allocator.cc:923]   Size: 426.68MiB | Requested Size: 76.6KiB | in_use: 0 | bin_num: 20, prev:   Size: 306.2KiB | Requested Size: 306.2KiB | in_use: 1 | bin_num: -1
2020-06-24 15:57:43.590989: I tensorflow/core/common_runtime/bfc_allocator.cc:930] Next region of size 448122880
2020-06-24 15:57:43.591128: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00eb0000 of size 1280 next 1
2020-06-24 15:57:43.591256: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00eb0500 of size 256 next 2
2020-06-24 15:57:43.591374: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00eb0600 of size 256 next 24
2020-06-24 15:57:43.591495: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00eb0700 of size 256 next 4
2020-06-24 15:57:43.591618: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00eb0800 of size 256 next 5
2020-06-24 15:57:43.591745: I tensorflow/core/common_runtime/bfc_allocator.cc:950] Free  at f00eb0900 of size 79360 next 6
2020-06-24 15:57:43.591866: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ec3f00 of size 256 next 12
2020-06-24 15:57:43.591993: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ec4000 of size 256 next 8
2020-06-24 15:57:43.592114: I tensorflow/core/common_runtime/bfc_allocator.cc:950] Free  at f00ec4100 of size 256 next 9
2020-06-24 15:57:43.592232: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ec4200 of size 256 next 13
2020-06-24 15:57:43.592353: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ec4300 of size 256 next 14
2020-06-24 15:57:43.592484: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ec4400 of size 1024 next 15
2020-06-24 15:57:43.592608: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ec4800 of size 256 next 16
2020-06-24 15:57:43.592730: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ec4900 of size 256 next 18
2020-06-24 15:57:43.592852: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ec4a00 of size 256 next 19
2020-06-24 15:57:43.592971: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ec4b00 of size 256 next 20
2020-06-24 15:57:43.593095: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ec4c00 of size 256 next 21
2020-06-24 15:57:43.593220: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ec4d00 of size 256 next 22
2020-06-24 15:57:43.593340: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ec4e00 of size 1024 next 23
2020-06-24 15:57:43.593584: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ec5200 of size 1024 next 10
2020-06-24 15:57:43.593726: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00ec5600 of size 153856 next 7
2020-06-24 15:57:43.593801: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00eeaf00 of size 78592 next 17
2020-06-24 15:57:43.593866: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00efe200 of size 78592 next 3
2020-06-24 15:57:43.593923: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at f00f11500 of size 313600 next 11
2020-06-24 15:57:43.593983: I tensorflow/core/common_runtime/bfc_allocator.cc:950] Free  at f00f5de00 of size 447410688 next 18446744073709551615
2020-06-24 15:57:43.594039: I tensorflow/core/common_runtime/bfc_allocator.cc:955]      Summary of in-use Chunks by size:
2020-06-24 15:57:43.594117: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 14 Chunks of size 256 totalling 3.5KiB
2020-06-24 15:57:43.594197: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 3 Chunks of size 1024 totalling 3.0KiB
2020-06-24 15:57:43.594274: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 1 Chunks of size 1280 totalling 1.2KiB
2020-06-24 15:57:43.594359: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 2 Chunks of size 78592 totalling 153.5KiB
2020-06-24 15:57:43.594449: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 1 Chunks of size 153856 totalling 150.2KiB
2020-06-24 15:57:43.594539: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 1 Chunks of size 313600 totalling 306.2KiB
2020-06-24 15:57:43.594629: I tensorflow/core/common_runtime/bfc_allocator.cc:962] Sum Total of in-use chunks: 617.8KiB
2020-06-24 15:57:43.594716: I tensorflow/core/common_runtime/bfc_allocator.cc:964] total_region_allocated_bytes_: 448122880 memory_limit_: 448122880 available bytes: 0 curr_region_allocation_bytes_: 896245760
2020-06-24 15:57:43.594827: I tensorflow/core/common_runtime/bfc_allocator.cc:970] Stats:
Limit:                   448122880
InUse:                      632576
MaxInUse:                   632576
NumAllocs:                      87
MaxAllocSize:               313600

2020-06-24 15:57:43.594929: W tensorflow/core/common_runtime/bfc_allocator.cc:429] *___________________________________________________________________________________________________
2020-06-24 15:57:43.595095: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger Requested amount of GPU memory (1073741824 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
2020-06-24 15:57:43.595280: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger /home/jenkins/workspace/TensorRT/helpers/rel-7.1/L1_Nightly_Internal/build/source/rtSafe/resources.h (181) - OutOfMemory Error in GpuMemory: 0
2020-06-24 15:57:43.610665: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger /home/jenkins/workspace/TensorRT/helpers/rel-7.1/L1_Nightly_Internal/build/source/rtSafe/resources.h (181) - OutOfMemory Error in GpuMemory: 0
2020-06-24 15:57:43.610845: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:751] Engine creation for PartitionedCall/TRTEngineOp_0 failed. The native segment will be used instead. Reason: Internal: Failed to build TensorRT engine
2020-06-24 15:57:43.615932: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-06-24 15:57:43.620093: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
(100, 10) (100,)
Accuracy of predictions = 89.00 %
Finished inference; time taken = 36.599 s

## z4_trt_infer_new.py
"""
Example script which trains a simple CNN for 1 epoch on a subset of MNIST, and
converts the model to TensorRT format, for enhanced performance which fully
utilises the NVIDIA GPU.

Useful resources:
- https://stackoverflow.com/questions/58846828/how-to-convert-tensorflow-2-0-savedmodel-to-tensorrt
- https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#worflow-with-savedmodel
- https://www.tensorflow.org/api_docs/python/tf/experimental/tensorrt/Converter
- https://github.com/tensorflow/tensorflow/issues/34339
- https://github.com/tensorflow/tensorrt/blob/master/tftrt/examples/image-classification/image_classification.py

Tested on the NVIDIA Jetson Nano, Python 3.6.9, tensorflow 2.1.0+nv20.4, numpy
1.16.1
"""
import os
from time import perf_counter
import numpy as np

t0 = perf_counter()

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    tf.config.experimental.set_virtual_device_configuration(
        gpu,
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=2048)]
    )
from tensorflow.keras import datasets
from tensorflow.python.saved_model import signature_constants
from tensorflow.python.saved_model import tag_constants
from tensorflow.python.framework import convert_to_constants
tf.compat.v1.enable_eager_execution() # see github issue above

# Get training and test data
(x_train, y_train), (x_test, y_test) = datasets.mnist.load_data()
x_train = np.expand_dims(x_train, -1) / 255.0
x_test = np.expand_dims(x_test, -1) / 255.0

# TEMPORARY: just use 100 test points to minimise GPU memory
num_points = 100
x_test, y_test = x_test[:num_points], y_test[:num_points]

current_dir = os.path.dirname(os.path.abspath(__file__))
trt_model_dir = os.path.join(current_dir, "CNN_MNIST_TRT")
# Make predictions using saved model, and print the results (NB using an alias
# for tf.saved_model.load, because the normal way of calling this function
# throws an error because for some reason it is expecting a sess)
saved_model_loaded = tf.compat.v1.saved_model.load_v2(
    export_dir=trt_model_dir, tags=[tag_constants.SERVING])
graph_func = saved_model_loaded.signatures[
    signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
graph_func = convert_to_constants.convert_variables_to_constants_v2(graph_func)
x_test_tensor = tf.convert_to_tensor(x_test, dtype=tf.float32)
t1 = perf_counter()
preds = graph_func(x_test_tensor)[0].numpy()
t2 = perf_counter()
print(preds.shape, y_test.shape)
accuracy = list(preds.argmax(axis=1) == y_test).count(True) / y_test.size
print("Accuracy of predictions = {:.2f} %".format(accuracy * 100))

t3 = perf_counter()
print("Finished inference")
print("Time taken (total) = {:.3f} s".format(t3 - t0))
print("Time taken (inference) = {:.3f} s".format(t2 - t1))
	"""
	Example script which trains a simple CNN for 1 epoch on a subset of MNIST, and
	converts the model to TensorRT format, for enhanced performance which fully
	utilises the NVIDIA GPU, and then performs inference.

	Useful resources:
	- https://stackoverflow.com/questions/58846828/how-to-convert-tensorflow-2-0-savedmodel-to-tensorrt
	- https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#worflow-with-savedmodel
	- https://www.tensorflow.org/api_docs/python/tf/experimental/tensorrt/Converter
	- https://github.com/tensorflow/tensorflow/issues/34339
	- https://github.com/tensorflow/tensorrt/blob/master/tftrt/examples/image-classification/image_classification.py

	Tested on the NVIDIA Jetson Nano, Python 3.6.9, tensorflow 2.1.0+nv20.4, numpy
	1.16.1
	"""
	import os
	from time import perf_counter
	import numpy as np

	t0 = perf_counter()

	import tensorflow as tf
	from tensorflow.keras import datasets, layers, models, Input
	from tensorflow.python.compiler.tensorrt import trt_convert as trt
	from tensorflow.python.saved_model import signature_constants
	from tensorflow.python.saved_model import tag_constants
	from tensorflow.python.framework import convert_to_constants
	tf.compat.v1.enable_eager_execution() # see github issue above

	# Get training and test data
	(x_train, y_train), (x_test, y_test) = datasets.mnist.load_data()
	x_train = np.expand_dims(x_train, -1) / 255.0
	x_test = np.expand_dims(x_test, -1) / 255.0

	# Create model
	model = models.Sequential()
	# model.add(Input(shape=x_train.shape[1:], batch_size=batch_size))
	model.add(layers.Conv2D(10, (5, 5), activation='relu', padding="same"))
	model.add(layers.MaxPooling2D((2, 2)))
	model.add(layers.Flatten())
	model.add(layers.Dense(10))

	# Compile and train model
	model.compile(optimizer='adam',
	loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
	metrics=['accuracy'])

	model.fit(
	x_train[:10000], y_train[:10000], validation_data=(x_test, y_test),
	batch_size=100, epochs=1,
	)

	# Save model
	print("Saving model...")
	current_dir = os.path.dirname(os.path.abspath(__file__))
	model_dir = os.path.join(current_dir, "CNN_MNIST")
	if not os.path.isdir(model_dir): os.makedirs(model_dir)
	# model.save(model_dir)
	tf.saved_model.save(model, model_dir)


	# Convert to TRT format
	trt_model_dir = os.path.join(current_dir, "CNN_MNIST_TRT")
	converter = trt.TrtGraphConverterV2(input_saved_model_dir=model_dir)
	converter.convert()
	converter.save(trt_model_dir)

	t1 = perf_counter()
	print("Finished TRT conversion; time taken = {:.3f} s".format(t1 - t0))


	# Make predictions using saved model, and print the results (NB using an alias
	# for tf.saved_model.load, because the normal way of calling this function
	# throws an error because for some reason it is expecting a sess)
	saved_model_loaded = tf.compat.v1.saved_model.load_v2(
	export_dir=trt_model_dir, tags=[tag_constants.SERVING])
	graph_func = saved_model_loaded.signatures[
	signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
	graph_func = convert_to_constants.convert_variables_to_constants_v2(graph_func)
	x_test_tensor = tf.convert_to_tensor(x_test, dtype=tf.float32)
	preds = graph_func(x_test_tensor)[0].numpy()
	print(preds.shape, y_test.shape)
	accuracy = list(preds.argmax(axis=1) == y_test).count(True) / y_test.size
	print("Accuracy of predictions = {:.2f} %".format(accuracy * 100))
	"""
	Example script which trains a simple CNN for 1 epoch on a subset of MNIST, and
	converts the model to TensorRT format, for enhanced performance which fully
	utilises the NVIDIA GPU.

	Useful resources:
	- https://stackoverflow.com/questions/58846828/how-to-convert-tensorflow-2-0-savedmodel-to-tensorrt
	- https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#worflow-with-savedmodel
	- https://www.tensorflow.org/api_docs/python/tf/experimental/tensorrt/Converter
	- https://github.com/tensorflow/tensorflow/issues/34339
	- https://github.com/tensorflow/tensorrt/blob/master/tftrt/examples/image-classification/image_classification.py

	Tested on the NVIDIA Jetson Nano, Python 3.6.9, tensorflow 2.1.0+nv20.4, numpy
	1.16.1
	"""
	import os
	from time import perf_counter
	import numpy as np

	t0 = perf_counter()

	import tensorflow as tf
	from tensorflow.keras import datasets
	from tensorflow.python.saved_model import signature_constants
	from tensorflow.python.saved_model import tag_constants
	from tensorflow.python.framework import convert_to_constants
	tf.compat.v1.enable_eager_execution() # see github issue above

	# Get training and test data
	(x_train, y_train), (x_test, y_test) = datasets.mnist.load_data()
	x_train = np.expand_dims(x_train, -1) / 255.0
	x_test = np.expand_dims(x_test, -1) / 255.0

	# TEMPORARY: just use 100 test points to minimise GPU memory
	num_points = 100
	x_test, y_test = x_test[:num_points], y_test[:num_points]

	current_dir = os.path.dirname(os.path.abspath(__file__))
	trt_model_dir = os.path.join(current_dir, "CNN_MNIST_TRT")
	# Make predictions using saved model, and print the results (NB using an alias
	# for tf.saved_model.load, because the normal way of calling this function
	# throws an error because for some reason it is expecting a sess)
	saved_model_loaded = tf.compat.v1.saved_model.load_v2(
	export_dir=trt_model_dir, tags=[tag_constants.SERVING])
	graph_func = saved_model_loaded.signatures[
	signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
	graph_func = convert_to_constants.convert_variables_to_constants_v2(graph_func)
	x_test_tensor = tf.convert_to_tensor(x_test, dtype=tf.float32)
	preds = graph_func(x_test_tensor)[0].numpy()
	print(preds.shape, y_test.shape)
	accuracy = list(preds.argmax(axis=1) == y_test).count(True) / y_test.size
	print("Accuracy of predictions = {:.2f} %".format(accuracy * 100))

	t1 = perf_counter()
	print("Finished inference; time taken = {:.3f} s".format(t1 - t0))