Skip to content

Instantly share code, notes, and snippets.

@harish2704
Last active December 30, 2019 17:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save harish2704/50b5537bf6a3cddfb8d3d50134748d4a to your computer and use it in GitHub Desktop.
Save harish2704/50b5537bf6a3cddfb8d3d50134748d4a to your computer and use it in GitHub Desktop.
ROCm vs CUDA performance comparison based on training of image_ocr example from Keras
```
python3 examples/image_ocr.py
Using TensorFlow backend.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:541: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4479: The name tf.truncated_normal is deprecated. Please use tf.random.truncated_normal instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4267: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4432: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:66: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
the_input (InputLayer) (None, 128, 64, 1) 0
__________________________________________________________________________________________________
conv1 (Conv2D) (None, 128, 64, 16) 160 the_input[0][0]
__________________________________________________________________________________________________
max1 (MaxPooling2D) (None, 64, 32, 16) 0 conv1[0][0]
__________________________________________________________________________________________________
conv2 (Conv2D) (None, 64, 32, 16) 2320 max1[0][0]
__________________________________________________________________________________________________
max2 (MaxPooling2D) (None, 32, 16, 16) 0 conv2[0][0]
__________________________________________________________________________________________________
reshape (Reshape) (None, 32, 256) 0 max2[0][0]
__________________________________________________________________________________________________
dense1 (Dense) (None, 32, 32) 8224 reshape[0][0]
__________________________________________________________________________________________________
gru1 (GRU) (None, 32, 512) 837120 dense1[0][0]
__________________________________________________________________________________________________
gru1_b (GRU) (None, 32, 512) 837120 dense1[0][0]
__________________________________________________________________________________________________
add_1 (Add) (None, 32, 512) 0 gru1[0][0]
gru1_b[0][0]
__________________________________________________________________________________________________
gru2 (GRU) (None, 32, 512) 1574400 add_1[0][0]
__________________________________________________________________________________________________
gru2_b (GRU) (None, 32, 512) 1574400 add_1[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, 32, 1024) 0 gru2[0][0]
gru2_b[0][0]
__________________________________________________________________________________________________
dense2 (Dense) (None, 32, 28) 28700 concatenate_1[0][0]
__________________________________________________________________________________________________
softmax (Activation) (None, 32, 28) 0 dense2[0][0]
==================================================================================================
Total params: 4,862,444
Trainable params: 4,862,444
Non-trainable params: 0
__________________________________________________________________________________________________
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/array_ops.py:1475: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4551: The name tf.log is deprecated. Please use tf.math.log instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:793: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1033: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1020: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3005: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
Epoch 1/20
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:190: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:197: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
2019-12-30 14:41:49.026347: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX512F
2019-12-30 14:41:49.047770: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2000170000 Hz
2019-12-30 14:41:49.048317: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2ddad80 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2019-12-30 14:41:49.048356: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2019-12-30 14:41:49.054308: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2019-12-30 14:41:49.269336: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-30 14:41:49.270173: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2ddaf40 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2019-12-30 14:41:49.270202: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2019-12-30 14:41:49.271473: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-30 14:41:49.272461: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:04.0
2019-12-30 14:41:49.284335: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2019-12-30 14:41:49.505987: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2019-12-30 14:41:49.636105: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2019-12-30 14:41:49.653096: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2019-12-30 14:41:49.921397: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2019-12-30 14:41:49.940272: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2019-12-30 14:41:50.451087: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-12-30 14:41:50.451270: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-30 14:41:50.451997: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-30 14:41:50.452577: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-12-30 14:41:50.455852: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2019-12-30 14:41:50.457216: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-30 14:41:50.457246: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2019-12-30 14:41:50.457257: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2019-12-30 14:41:50.458468: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-30 14:41:50.459135: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-30 14:41:50.459716: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2019-12-30 14:41:50.459756: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15216 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:207: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:216: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:223: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.
2019-12-30 14:41:58.521534: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2019-12-30 14:42:00.035858: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
400/400 [==============================] - 105s 262ms/step - loss: 42.1394 - val_loss: 47.7223
Out of 256 samples: Mean edit distance:3.496 Mean normalized edit distance: 1.000
Epoch 2/20
400/400 [==============================] - 89s 221ms/step - loss: 42.0470 - val_loss: 47.7538
Out of 256 samples: Mean edit distance:3.531 Mean normalized edit distance: 1.000
Epoch 3/20
400/400 [==============================] - 87s 217ms/step - loss: 42.0483 - val_loss: 47.7493
Out of 256 samples: Mean edit distance:3.547 Mean normalized edit distance: 1.000
Epoch 4/20
400/400 [==============================] - 85s 213ms/step - loss: 42.0706 - val_loss: 47.6910
Out of 256 samples: Mean edit distance:3.523 Mean normalized edit distance: 1.000
Epoch 5/20
400/400 [==============================] - 84s 209ms/step - loss: 42.0256 - val_loss: 47.7445
Out of 256 samples: Mean edit distance:3.477 Mean normalized edit distance: 1.000
Epoch 6/20
400/400 [==============================] - 83s 207ms/step - loss: 42.0379 - val_loss: 47.7841
Out of 256 samples: Mean edit distance:3.539 Mean normalized edit distance: 1.000
Epoch 7/20
400/400 [==============================] - 82s 206ms/step - loss: 42.0434 - val_loss: 47.6471
Out of 256 samples: Mean edit distance:3.527 Mean normalized edit distance: 1.000
Epoch 8/20
400/400 [==============================] - 82s 205ms/step - loss: 42.0636 - val_loss: 47.7264
Out of 256 samples: Mean edit distance:3.574 Mean normalized edit distance: 1.000
Epoch 9/20
400/400 [==============================] - 81s 204ms/step - loss: 42.0013 - val_loss: 47.6823
Out of 256 samples: Mean edit distance:3.566 Mean normalized edit distance: 1.000
Epoch 10/20
400/400 [==============================] - 85s 212ms/step - loss: 42.0180 - val_loss: 47.7052
Out of 256 samples: Mean edit distance:3.566 Mean normalized edit distance: 1.000
Epoch 11/20
400/400 [==============================] - 85s 212ms/step - loss: 42.0193 - val_loss: 47.7396
Out of 256 samples: Mean edit distance:3.609 Mean normalized edit distance: 1.000
Epoch 12/20
400/400 [==============================] - 85s 212ms/step - loss: 42.0346 - val_loss: 47.6906
Out of 256 samples: Mean edit distance:3.551 Mean normalized edit distance: 1.000
Epoch 13/20
400/400 [==============================] - 85s 211ms/step - loss: 42.0782 - val_loss: 47.7711
Out of 256 samples: Mean edit distance:3.508 Mean normalized edit distance: 1.000
Epoch 14/20
400/400 [==============================] - 84s 210ms/step - loss: 42.0240 - val_loss: 47.8202
Out of 256 samples: Mean edit distance:3.539 Mean normalized edit distance: 1.000
Epoch 15/20
400/400 [==============================] - 85s 211ms/step - loss: 42.0326 - val_loss: 47.6735
Out of 256 samples: Mean edit distance:3.523 Mean normalized edit distance: 1.000
Epoch 16/20
400/400 [==============================] - 84s 211ms/step - loss: 42.0295 - val_loss: 47.7353
Out of 256 samples: Mean edit distance:3.480 Mean normalized edit distance: 1.000
Epoch 17/20
400/400 [==============================] - 84s 210ms/step - loss: 42.0263 - val_loss: 47.7402
Out of 256 samples: Mean edit distance:3.492 Mean normalized edit distance: 1.000
Epoch 18/20
400/400 [==============================] - 84s 210ms/step - loss: 41.9883 - val_loss: 47.6956
Out of 256 samples: Mean edit distance:3.531 Mean normalized edit distance: 1.000
Epoch 19/20
400/400 [==============================] - 84s 210ms/step - loss: 42.0159 - val_loss: 47.7222
Out of 256 samples: Mean edit distance:3.551 Mean normalized edit distance: 1.000
Epoch 20/20
400/400 [==============================] - 83s 209ms/step - loss: 42.0260 - val_loss: 47.7082
Out of 256 samples: Mean edit distance:3.559 Mean normalized edit distance: 1.000
Model: "model_3"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
the_input (InputLayer) (None, 512, 64, 1) 0
__________________________________________________________________________________________________
conv1 (Conv2D) (None, 512, 64, 16) 160 the_input[0][0]
__________________________________________________________________________________________________
max1 (MaxPooling2D) (None, 256, 32, 16) 0 conv1[0][0]
__________________________________________________________________________________________________
conv2 (Conv2D) (None, 256, 32, 16) 2320 max1[0][0]
__________________________________________________________________________________________________
max2 (MaxPooling2D) (None, 128, 16, 16) 0 conv2[0][0]
__________________________________________________________________________________________________
reshape (Reshape) (None, 128, 256) 0 max2[0][0]
__________________________________________________________________________________________________
dense1 (Dense) (None, 128, 32) 8224 reshape[0][0]
__________________________________________________________________________________________________
gru1 (GRU) (None, 128, 512) 837120 dense1[0][0]
__________________________________________________________________________________________________
gru1_b (GRU) (None, 128, 512) 837120 dense1[0][0]
__________________________________________________________________________________________________
add_2 (Add) (None, 128, 512) 0 gru1[0][0]
gru1_b[0][0]
__________________________________________________________________________________________________
gru2 (GRU) (None, 128, 512) 1574400 add_2[0][0]
__________________________________________________________________________________________________
gru2_b (GRU) (None, 128, 512) 1574400 add_2[0][0]
__________________________________________________________________________________________________
concatenate_2 (Concatenate) (None, 128, 1024) 0 gru2[0][0]
gru2_b[0][0]
__________________________________________________________________________________________________
dense2 (Dense) (None, 128, 28) 28700 concatenate_2[0][0]
__________________________________________________________________________________________________
softmax (Activation) (None, 128, 28) 0 dense2[0][0]
==================================================================================================
Total params: 4,862,444
Trainable params: 4,862,444
Non-trainable params: 0
__________________________________________________________________________________________________
Epoch 21/25
400/400 [==============================] - 303s 758ms/step - loss: 37.3943 - val_loss: 42.5088
Out of 256 samples: Mean edit distance:3.520 Mean normalized edit distance: 1.000
Epoch 22/25
400/400 [==============================] - 301s 752ms/step - loss: 89.2964 - val_loss: 95.7861
Out of 256 samples: Mean edit distance:7.988 Mean normalized edit distance: 1.000
Epoch 23/25
400/400 [==============================] - 305s 762ms/step - loss: 91.3408 - val_loss: 103.0809
Out of 256 samples: Mean edit distance:8.066 Mean normalized edit distance: 1.000
Epoch 24/25
400/400 [==============================] - 318s 795ms/step - loss: 91.4558 - val_loss: 103.4552
Out of 256 samples: Mean edit distance:8.223 Mean normalized edit distance: 1.000
Epoch 25/25
400/400 [==============================] - 325s 814ms/step - loss: 91.4825 - val_loss: 104.3913
Out of 256 samples: Mean edit distance:8.230 Mean normalized edit distance: 1.000
```
```
[hari@localhost ~/Downloads/keras-2.3.1]$ python3 examples/image_ocr.py
Using TensorFlow backend.
2019-12-30 20:25:35.117548: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libhip_hcc.so
2019-12-30 20:25:35.157206: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1629] Found device 0 with properties:
name: Vega 10 XL/XT [Radeon RX Vega 56/64]
AMDGPU ISA: gfx900
memoryClockRate (GHz) 1.59
pciBusID 0000:0a:00.0
2019-12-30 20:25:35.184890: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocblas.so
2019-12-30 20:25:35.185795: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libMIOpen.so
2019-12-30 20:25:35.186547: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocfft.so
2019-12-30 20:25:35.186689: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocrand.so
2019-12-30 20:25:35.186753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-12-30 20:25:35.186971: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-12-30 20:25:35.189942: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3593360000 Hz
2019-12-30 20:25:35.190280: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55ada2477f40 executing computations on platform Host. Devices:
2019-12-30 20:25:35.190293: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
2019-12-30 20:25:35.190389: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1629] Found device 0 with properties:
name: Vega 10 XL/XT [Radeon RX Vega 56/64]
AMDGPU ISA: gfx900
memoryClockRate (GHz) 1.59
pciBusID 0000:0a:00.0
2019-12-30 20:25:35.190406: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocblas.so
2019-12-30 20:25:35.190416: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libMIOpen.so
2019-12-30 20:25:35.190425: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocfft.so
2019-12-30 20:25:35.190432: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocrand.so
2019-12-30 20:25:35.190463: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-12-30 20:25:35.190500: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-30 20:25:35.190506: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2019-12-30 20:25:35.190510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2019-12-30 20:25:35.190578: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7524 MB memory) -> physical GPU (device: 0, name: Vega 10 XL/XT [Radeon RX Vega 56/64], pci bus id: 0000:0a:00.0)
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
the_input (InputLayer) (None, 128, 64, 1) 0
__________________________________________________________________________________________________
conv1 (Conv2D) (None, 128, 64, 16) 160 the_input[0][0]
__________________________________________________________________________________________________
max1 (MaxPooling2D) (None, 64, 32, 16) 0 conv1[0][0]
__________________________________________________________________________________________________
conv2 (Conv2D) (None, 64, 32, 16) 2320 max1[0][0]
__________________________________________________________________________________________________
max2 (MaxPooling2D) (None, 32, 16, 16) 0 conv2[0][0]
__________________________________________________________________________________________________
reshape (Reshape) (None, 32, 256) 0 max2[0][0]
__________________________________________________________________________________________________
dense1 (Dense) (None, 32, 32) 8224 reshape[0][0]
__________________________________________________________________________________________________
gru1 (GRU) (None, 32, 512) 837120 dense1[0][0]
__________________________________________________________________________________________________
gru1_b (GRU) (None, 32, 512) 837120 dense1[0][0]
__________________________________________________________________________________________________
add_1 (Add) (None, 32, 512) 0 gru1[0][0]
gru1_b[0][0]
__________________________________________________________________________________________________
gru2 (GRU) (None, 32, 512) 1574400 add_1[0][0]
__________________________________________________________________________________________________
gru2_b (GRU) (None, 32, 512) 1574400 add_1[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, 32, 1024) 0 gru2[0][0]
gru2_b[0][0]
__________________________________________________________________________________________________
dense2 (Dense) (None, 32, 28) 28700 concatenate_1[0][0]
__________________________________________________________________________________________________
softmax (Activation) (None, 32, 28) 0 dense2[0][0]
==================================================================================================
Total params: 4,862,444
Trainable params: 4,862,444
Non-trainable params: 0
__________________________________________________________________________________________________
Epoch 1/20
2019-12-30 20:25:44.418024: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocblas.so
2019-12-30 20:25:44.432239: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libMIOpen.so
400/400 [==============================] - 124s 311ms/step - loss: 42.1394 - val_loss: 49.8171
Out of 256 samples: Mean edit distance:3.496 Mean normalized edit distance: 1.000
Epoch 2/20
400/400 [==============================] - 115s 289ms/step - loss: 42.0860 - val_loss: 49.8461
Out of 256 samples: Mean edit distance:3.531 Mean normalized edit distance: 1.000
Epoch 3/20
400/400 [==============================] - 114s 286ms/step - loss: 42.0466 - val_loss: 44.5114
Out of 256 samples: Mean edit distance:3.555 Mean normalized edit distance: 1.000
Epoch 4/20
400/400 [==============================] - 114s 286ms/step - loss: 42.0138 - val_loss: 48.9335
Out of 256 samples: Mean edit distance:3.523 Mean normalized edit distance: 1.000
Epoch 5/20
400/400 [==============================] - 114s 286ms/step - loss: 42.0395 - val_loss: 46.2727
Out of 256 samples: Mean edit distance:3.477 Mean normalized edit distance: 1.000
Epoch 6/20
400/400 [==============================] - 115s 287ms/step - loss: 42.0567 - val_loss: 48.0469
Out of 256 samples: Mean edit distance:3.539 Mean normalized edit distance: 1.000
Epoch 7/20
400/400 [==============================] - 115s 287ms/step - loss: 42.0235 - val_loss: 46.7224
Out of 256 samples: Mean edit distance:3.527 Mean normalized edit distance: 1.000
Epoch 8/20
400/400 [==============================] - 114s 286ms/step - loss: 42.0072 - val_loss: 49.3743
Out of 256 samples: Mean edit distance:3.574 Mean normalized edit distance: 1.000
Epoch 9/20
400/400 [==============================] - 114s 286ms/step - loss: 42.0692 - val_loss: 45.8388
Out of 256 samples: Mean edit distance:3.570 Mean normalized edit distance: 1.000
Epoch 10/20
400/400 [==============================] - 114s 286ms/step - loss: 42.0262 - val_loss: 48.0622
Out of 256 samples: Mean edit distance:3.574 Mean normalized edit distance: 1.000
Epoch 11/20
400/400 [==============================] - 114s 286ms/step - loss: 42.0037 - val_loss: 48.9347
Out of 256 samples: Mean edit distance:3.531 Mean normalized edit distance: 1.000
Epoch 12/20
400/400 [==============================] - 114s 286ms/step - loss: 42.0007 - val_loss: 47.6128
Out of 256 samples: Mean edit distance:3.520 Mean normalized edit distance: 1.000
Epoch 13/20
400/400 [==============================] - 115s 286ms/step - loss: 41.9994 - val_loss: 48.9368
Out of 256 samples: Mean edit distance:3.551 Mean normalized edit distance: 1.000
Epoch 14/20
400/400 [==============================] - 115s 287ms/step - loss: 42.0511 - val_loss: 47.1542
Out of 256 samples: Mean edit distance:3.559 Mean normalized edit distance: 1.000
Epoch 15/20
400/400 [==============================] - 115s 287ms/step - loss: 42.0149 - val_loss: 48.0377
Out of 256 samples: Mean edit distance:3.527 Mean normalized edit distance: 1.000
Epoch 16/20
400/400 [==============================] - 115s 287ms/step - loss: 42.0511 - val_loss: 48.0680
Out of 256 samples: Mean edit distance:3.480 Mean normalized edit distance: 1.000
Epoch 17/20
400/400 [==============================] - 115s 287ms/step - loss: 42.0279 - val_loss: 48.4952
Out of 256 samples: Mean edit distance:3.516 Mean normalized edit distance: 1.000
Epoch 18/20
400/400 [==============================] - 115s 287ms/step - loss: 42.0322 - val_loss: 44.9407
Out of 256 samples: Mean edit distance:3.527 Mean normalized edit distance: 1.000
Epoch 19/20
400/400 [==============================] - 115s 286ms/step - loss: 42.0471 - val_loss: 46.2739
Out of 256 samples: Mean edit distance:3.547 Mean normalized edit distance: 1.000
Epoch 20/20
400/400 [==============================] - 115s 286ms/step - loss: 42.0102 - val_loss: 47.5970
Out of 256 samples: Mean edit distance:3.566 Mean normalized edit distance: 1.000
Model: "model_3"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
the_input (InputLayer) (None, 512, 64, 1) 0
__________________________________________________________________________________________________
conv1 (Conv2D) (None, 512, 64, 16) 160 the_input[0][0]
__________________________________________________________________________________________________
max1 (MaxPooling2D) (None, 256, 32, 16) 0 conv1[0][0]
__________________________________________________________________________________________________
conv2 (Conv2D) (None, 256, 32, 16) 2320 max1[0][0]
__________________________________________________________________________________________________
max2 (MaxPooling2D) (None, 128, 16, 16) 0 conv2[0][0]
__________________________________________________________________________________________________
reshape (Reshape) (None, 128, 256) 0 max2[0][0]
__________________________________________________________________________________________________
dense1 (Dense) (None, 128, 32) 8224 reshape[0][0]
__________________________________________________________________________________________________
gru1 (GRU) (None, 128, 512) 837120 dense1[0][0]
__________________________________________________________________________________________________
gru1_b (GRU) (None, 128, 512) 837120 dense1[0][0]
__________________________________________________________________________________________________
add_2 (Add) (None, 128, 512) 0 gru1[0][0]
gru1_b[0][0]
__________________________________________________________________________________________________
gru2 (GRU) (None, 128, 512) 1574400 add_2[0][0]
__________________________________________________________________________________________________
gru2_b (GRU) (None, 128, 512) 1574400 add_2[0][0]
__________________________________________________________________________________________________
concatenate_2 (Concatenate) (None, 128, 1024) 0 gru2[0][0]
gru2_b[0][0]
__________________________________________________________________________________________________
dense2 (Dense) (None, 128, 28) 28700 concatenate_2[0][0]
__________________________________________________________________________________________________
softmax (Activation) (None, 128, 28) 0 dense2[0][0]
==================================================================================================
Total params: 4,862,444
Trainable params: 4,862,444
Non-trainable params: 0
__________________________________________________________________________________________________
Epoch 21/25
400/400 [==============================] - 442s 1s/step - loss: 37.3943 - val_loss: 44.3882
Out of 256 samples: Mean edit distance:3.496 Mean normalized edit distance: 1.000
Epoch 22/25
400/400 [==============================] - 432s 1s/step - loss: 89.3359 - val_loss: 99.9920
Out of 256 samples: Mean edit distance:8.180 Mean normalized edit distance: 1.000
Epoch 23/25
400/400 [==============================] - 433s 1s/step - loss: 91.5029 - val_loss: 106.0722
Out of 256 samples: Mean edit distance:8.051 Mean normalized edit distance: 1.000
Epoch 24/25
400/400 [==============================] - 432s 1s/step - loss: 91.4113 - val_loss: 105.2302
Out of 256 samples: Mean edit distance:8.215 Mean normalized edit distance: 1.000
Epoch 25/25
400/400 [==============================] - 432s 1s/step - loss: 91.4994 - val_loss: 95.5991
Out of 256 samples: Mean edit distance:8.188 Mean normalized edit distance: 1.000
```

Spece from offcial website

Param Nvidia Tesla-P100 AMD Vega56
Single-Precision Performance 9.3 teraFLOPS 10.5 TFLOPs
Memory 16 GB 8 GB
Memory Bandwidth 732 GB/s 410 GB/s

Time taken for same training step

Param Nvidia Tesla-P100 AMD Vega56
Relativly small CRNN model 210ms/step 287ms/step
Relativly large CRNN model 762ms/step 432s 1s/step
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment