Skip to content

Instantly share code, notes, and snippets.

@iizukak
Created April 14, 2020 23:56
Show Gist options
  • Save iizukak/59bfe59f2cdbbfbc028f92c38b7166de to your computer and use it in GitHub Desktop.
Save iizukak/59bfe59f2cdbbfbc028f92c38b7166de to your computer and use it in GitHub Desktop.
TensorFlow Lite quantization benchmark
STARTING!
Duplicate flags: num_threads
Min num runs: [50]
Min runs duration (seconds): [1]
Max runs duration (seconds): [150]
Inter-run delay (seconds): [-1]
Num threads: [1]
Benchmark name: []
Output prefix: []
Min warmup runs: [1]
Min warmup runs duration (seconds): [0.5]
Graph: [/home/iizuka/fer2013-tf/trained_models/mobilenet_small.tflite]
Input layers: []
Input shapes: []
Input value ranges: []
Input layer values files: []
Allow fp16 : [0]
Require full delegation : [0]
Enable op profiling: [1]
Max profiling buffer entries: [1024]
CSV File to export profiling data to: []
Enable platform-wide tracing: [0]
#threads used for CPU inference: [1]
Max number of delegated partitions : [0]
External delegate path : []
External delegate options : []
Use gpu : [0]
Use xnnpack : [0]
Loaded model /home/iizuka/fer2013-tf/trained_models/mobilenet_small.tflite
The input model file size (MB): 0.477872
Initialized session in 0.335ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=352 first=1625 curr=1359 min=1334 max=1842 avg=1419.58 std=71
Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=634 first=1430 curr=3071 min=1360 max=3317 avg=1554.21 std=374
Inference timings in us: Init: 335, First inference: 1625, Warmup (avg): 1419.58, Inference (avg): 1554.21
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Peak memory footprint (MB): init=0.5625 overall=2.61719
Profiling Info for Benchmark Initialization:
============================== Run Order ==============================
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
AllocateTensors 0.000 0.031 0.031 100.000% 100.000% 0.000 1 AllocateTensors/0
============================== Top by Computation Time ==============================
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
AllocateTensors 0.000 0.031 0.031 100.000% 100.000% 0.000 1 AllocateTensors/0
Number of nodes executed: 1
============================== Summary by node type ==============================
[Node type] [count] [avg ms] [avg %] [cdf %] [mem KB] [times called]
AllocateTensors 1 0.031 100.000% 100.000% 0.000 1
Timings (microseconds): count=1 curr=31
Memory (bytes): count=0
1 nodes observed
Operator-wise Profiling Info for Regular Benchmark Runs:
============================== Run Order ==============================
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
DEPTHWISE_CONV_2D 0.000 0.064 0.071 4.541% 4.541% 0.000 1 [fer_small/tf_op_layer_Relu/Relu]:0
DEPTHWISE_CONV_2D 0.071 0.367 0.412 26.558% 31.099% 0.000 1 [fer_small/tf_op_layer_Relu_1/Relu_1]:1
CONV_2D 0.483 0.131 0.149 9.576% 40.675% 0.000 1 [fer_small/tf_op_layer_Relu_2/Relu_2]:2
DEPTHWISE_CONV_2D 0.632 0.174 0.194 12.461% 53.136% 0.000 1 [fer_small/tf_op_layer_Relu_3/Relu_3]:3
DEPTHWISE_CONV_2D 0.825 0.163 0.183 11.753% 64.888% 0.000 1 [fer_small/tf_op_layer_Relu_4/Relu_4]:4
CONV_2D 1.008 0.118 0.135 8.665% 73.553% 0.000 1 [fer_small/tf_op_layer_Relu_5/Relu_5]:5
DEPTHWISE_CONV_2D 1.143 0.080 0.089 5.744% 79.297% 0.000 1 [fer_small/tf_op_layer_Relu_6/Relu_6]:6
DEPTHWISE_CONV_2D 1.232 0.114 0.079 5.100% 84.398% 0.000 1 [fer_small/tf_op_layer_Relu_7/Relu_7]:7
CONV_2D 1.311 0.132 0.143 9.203% 93.600% 0.000 1 [fer_small/tf_op_layer_Relu_8/Relu_8]:8
DEPTHWISE_CONV_2D 1.454 0.039 0.040 2.567% 96.167% 0.000 1 [fer_small/tf_op_layer_Relu_9/Relu_9]:9
MEAN 1.494 0.034 0.034 2.174% 98.342% 0.000 1 [fer_small/global_average_pooling2d/Mean]:10
FULLY_CONNECTED 1.528 0.013 0.024 1.539% 99.881% 0.000 1 [fer_small/dense/Relu]:11
FULLY_CONNECTED 1.552 0.001 0.001 0.076% 99.957% 0.000 1 [fer_small/dense_1/BiasAdd]:12
SOFTMAX 1.553 0.000 0.001 0.043% 100.000% 0.000 1 [Identity]:13
============================== Top by Computation Time ==============================
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
DEPTHWISE_CONV_2D 0.071 0.367 0.412 26.558% 26.558% 0.000 1 [fer_small/tf_op_layer_Relu_1/Relu_1]:1
DEPTHWISE_CONV_2D 0.632 0.174 0.194 12.461% 39.018% 0.000 1 [fer_small/tf_op_layer_Relu_3/Relu_3]:3
DEPTHWISE_CONV_2D 0.825 0.163 0.183 11.753% 50.771% 0.000 1 [fer_small/tf_op_layer_Relu_4/Relu_4]:4
CONV_2D 0.483 0.131 0.149 9.576% 60.347% 0.000 1 [fer_small/tf_op_layer_Relu_2/Relu_2]:2
CONV_2D 1.311 0.132 0.143 9.203% 69.549% 0.000 1 [fer_small/tf_op_layer_Relu_8/Relu_8]:8
CONV_2D 1.008 0.118 0.135 8.665% 78.214% 0.000 1 [fer_small/tf_op_layer_Relu_5/Relu_5]:5
DEPTHWISE_CONV_2D 1.143 0.080 0.089 5.744% 83.958% 0.000 1 [fer_small/tf_op_layer_Relu_6/Relu_6]:6
DEPTHWISE_CONV_2D 1.232 0.114 0.079 5.100% 89.059% 0.000 1 [fer_small/tf_op_layer_Relu_7/Relu_7]:7
DEPTHWISE_CONV_2D 0.000 0.064 0.071 4.541% 93.600% 0.000 1 [fer_small/tf_op_layer_Relu/Relu]:0
DEPTHWISE_CONV_2D 1.454 0.039 0.040 2.567% 96.167% 0.000 1 [fer_small/tf_op_layer_Relu_9/Relu_9]:9
Number of nodes executed: 14
============================== Summary by node type ==============================
[Node type] [count] [avg ms] [avg %] [cdf %] [mem KB] [times called]
DEPTHWISE_CONV_2D 7 1.064 68.867% 68.867% 0.000 7
CONV_2D 3 0.424 27.443% 96.311% 0.000 3
MEAN 1 0.033 2.136% 98.447% 0.000 1
FULLY_CONNECTED 2 0.024 1.553% 100.000% 0.000 2
SOFTMAX 1 0.000 0.000% 100.000% 0.000 1
Timings (microseconds): count=634 first=1430 curr=3069 min=1357 max=3315 avg=1552.91 std=374
Memory (bytes): count=0
14 nodes observed
STARTING!
Duplicate flags: num_threads
Min num runs: [50]
Min runs duration (seconds): [1]
Max runs duration (seconds): [150]
Inter-run delay (seconds): [-1]
Num threads: [1]
Benchmark name: []
Output prefix: []
Min warmup runs: [1]
Min warmup runs duration (seconds): [0.5]
Graph: [/home/iizuka/fer2013-tf/trained_models/mobilenet_small_quant.tflite]
Input layers: []
Input shapes: []
Input value ranges: []
Input layer values files: []
Allow fp16 : [0]
Require full delegation : [0]
Enable op profiling: [1]
Max profiling buffer entries: [1024]
CSV File to export profiling data to: []
Enable platform-wide tracing: [0]
#threads used for CPU inference: [1]
Max number of delegated partitions : [0]
External delegate path : []
External delegate options : []
Use gpu : [0]
Use xnnpack : [0]
Loaded model /home/iizuka/fer2013-tf/trained_models/mobilenet_small_quant.tflite
The input model file size (MB): 0.157232
Initialized session in 1.419ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=30 first=18720 curr=17505 min=13567 max=23425 avg=16820.9 std=1953
Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=63 first=17493 curr=15778 min=13120 max=18710 avg=15881.9 std=1497
Inference timings in us: Init: 1419, First inference: 18720, Warmup (avg): 16820.9, Inference (avg): 15881.9
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Peak memory footprint (MB): init=2.24219 overall=2.88672
Profiling Info for Benchmark Initialization:
============================== Run Order ==============================
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
AllocateTensors 0.000 0.086 0.086 100.000% 100.000% 1804.000 1 AllocateTensors/0
============================== Top by Computation Time ==============================
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
AllocateTensors 0.000 0.086 0.086 100.000% 100.000% 1804.000 1 AllocateTensors/0
Number of nodes executed: 1
============================== Summary by node type ==============================
[Node type] [count] [avg ms] [avg %] [cdf %] [mem KB] [times called]
AllocateTensors 1 0.086 100.000% 100.000% 1804.000 1
Timings (microseconds): count=1 curr=86
Memory (bytes): count=0
1 nodes observed
Operator-wise Profiling Info for Regular Benchmark Runs:
============================== Run Order ==============================
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
QUANTIZE 0.000 0.013 0.015 0.092% 0.092% 0.000 1 [img_int8]:0
DEPTHWISE_CONV_2D 0.015 0.185 0.218 1.374% 1.466% 0.000 1 [fer_small/tf_op_layer_Relu/Relu]:1
DEPTHWISE_CONV_2D 0.233 0.565 0.634 3.996% 5.462% 0.000 1 [fer_small/tf_op_layer_Relu_1/Relu_1]:2
CONV_2D 0.868 5.605 4.818 30.354% 35.815% 0.000 1 [fer_small/tf_op_layer_Relu_2/Relu_2]:3
DEPTHWISE_CONV_2D 5.686 0.287 0.320 2.019% 37.834% 0.000 1 [fer_small/tf_op_layer_Relu_3/Relu_3]:4
DEPTHWISE_CONV_2D 6.007 0.248 0.282 1.779% 39.613% 0.000 1 [fer_small/tf_op_layer_Relu_4/Relu_4]:5
CONV_2D 6.290 4.970 4.386 27.629% 67.243% 0.000 1 [fer_small/tf_op_layer_Relu_5/Relu_5]:6
DEPTHWISE_CONV_2D 10.677 0.126 0.146 0.922% 68.165% 0.000 1 [fer_small/tf_op_layer_Relu_6/Relu_6]:7
DEPTHWISE_CONV_2D 10.823 0.112 0.127 0.802% 68.966% 0.000 1 [fer_small/tf_op_layer_Relu_7/Relu_7]:8
CONV_2D 10.951 4.821 4.339 27.332% 96.299% 0.000 1 [fer_small/tf_op_layer_Relu_8/Relu_8]:9
DEPTHWISE_CONV_2D 15.291 0.067 0.081 0.512% 96.811% 0.000 1 [fer_small/tf_op_layer_Relu_9/Relu_9]:10
MEAN 15.373 0.051 0.044 0.276% 97.087% 0.000 1 [fer_small/global_average_pooling2d/Mean]:11
FULLY_CONNECTED 15.417 0.422 0.444 2.800% 99.887% 0.000 1 [fer_small/dense/Relu]:12
FULLY_CONNECTED 15.862 0.012 0.015 0.092% 99.980% 0.000 1 [fer_small/dense_1/BiasAdd]:13
SOFTMAX 15.877 0.002 0.002 0.011% 99.991% 0.000 1 [Identity_int8]:14
QUANTIZE 15.879 0.001 0.001 0.009% 100.000% 0.000 1 [Identity]:15
============================== Top by Computation Time ==============================
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
CONV_2D 0.868 5.605 4.818 30.354% 30.354% 0.000 1 [fer_small/tf_op_layer_Relu_2/Relu_2]:3
CONV_2D 6.290 4.970 4.386 27.629% 57.983% 0.000 1 [fer_small/tf_op_layer_Relu_5/Relu_5]:6
CONV_2D 10.951 4.821 4.339 27.332% 85.315% 0.000 1 [fer_small/tf_op_layer_Relu_8/Relu_8]:9
DEPTHWISE_CONV_2D 0.233 0.565 0.634 3.996% 89.311% 0.000 1 [fer_small/tf_op_layer_Relu_1/Relu_1]:2
FULLY_CONNECTED 15.417 0.422 0.444 2.800% 92.111% 0.000 1 [fer_small/dense/Relu]:12
DEPTHWISE_CONV_2D 5.686 0.287 0.320 2.019% 94.130% 0.000 1 [fer_small/tf_op_layer_Relu_3/Relu_3]:4
DEPTHWISE_CONV_2D 6.007 0.248 0.282 1.779% 95.909% 0.000 1 [fer_small/tf_op_layer_Relu_4/Relu_4]:5
DEPTHWISE_CONV_2D 0.015 0.185 0.218 1.374% 97.283% 0.000 1 [fer_small/tf_op_layer_Relu/Relu]:1
DEPTHWISE_CONV_2D 10.677 0.126 0.146 0.922% 98.205% 0.000 1 [fer_small/tf_op_layer_Relu_6/Relu_6]:7
DEPTHWISE_CONV_2D 10.823 0.112 0.127 0.802% 99.007% 0.000 1 [fer_small/tf_op_layer_Relu_7/Relu_7]:8
Number of nodes executed: 16
============================== Summary by node type ==============================
[Node type] [count] [avg ms] [avg %] [cdf %] [mem KB] [times called]
CONV_2D 3 13.541 85.346% 85.346% 0.000 3
DEPTHWISE_CONV_2D 7 1.808 11.395% 96.741% 0.000 7
FULLY_CONNECTED 2 0.458 2.887% 99.628% 0.000 2
MEAN 1 0.043 0.271% 99.899% 0.000 1
QUANTIZE 2 0.015 0.095% 99.994% 0.000 2
SOFTMAX 1 0.001 0.006% 100.000% 0.000 1
Timings (microseconds): count=63 first=17487 curr=15774 min=13116 max=18691 avg=15873.2 std=1494
Memory (bytes): count=0
16 nodes observed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment