Skip to content

Instantly share code, notes, and snippets.

@keichi
Last active July 6, 2022 08:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save keichi/0f5c00551d91ecc3d5e14f8e4799f0c8 to your computer and use it in GitHub Desktop.
Save keichi/0f5c00551d91ecc3d5e14f8e4799f0c8 to your computer and use it in GitHub Desktop.
Multi-VE training using TensorFlow-VE
#! /bin/bash
#PBS -q sx
#PBS -l elapstim_req=00:05:00
#PBS --venode 2
#PBS -S /bin/bash
export VE_OMP_NUM_THREADS=8
cd $PBS_O_WORKDIR
VE_NODE_NUMBER=0 python multi.py &
sleep 2
VE_NODE_NUMBER=1 python multi.py
WARNING:tensorflow:From multi.py:26: _CollectiveAllReduceStrategyExperimental.__init__ (from tensorflow.python.distribute.collective_all_reduce_strategy) is deprecated and will be removed in a future version.
Instructions for updating:
use distribute.MultiWorkerMirroredStrategy instead
WARNING:tensorflow:From multi.py:26: _CollectiveAllReduceStrategyExperimental.__init__ (from tensorflow.python.distribute.collective_all_reduce_strategy) is deprecated and will be removed in a future version.
Instructions for updating:
use distribute.MultiWorkerMirroredStrategy instead
2022-07-06 17:01:52.143647: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-06 17:01:52.143673: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-06 17:02:00.397267: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:301] Initialize GrpcChannelCache for job worker -> {0 -> localhost:12345, 1 -> localhost:23456}
2022-07-06 17:02:00.397413: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:411] Started server with target: grpc://localhost:12345
2022-07-06 17:02:00.431113: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:301] Initialize GrpcChannelCache for job worker -> {0 -> localhost:12345, 1 -> localhost:23456}
2022-07-06 17:02:00.431350: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:411] Started server with target: grpc://localhost:23456
WARNING:tensorflow:Please add `keras.layers.InputLayer` instead of `keras.Input` to Sequential model. `keras.Input` is intended to be used by Functional model.
WARNING:tensorflow:Please add `keras.layers.InputLayer` instead of `keras.Input` to Sequential model. `keras.Input` is intended to be used by Functional model.
2022-07-06 17:02:01.731374: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:695] AUTO sharding policy will apply DATA sharding policy as it failed to apply FILE sharding policy because of the following reason: Found an unshardable source dataset: name: "TensorSliceDataset/_2"
op: "TensorSliceDataset"
input: "Placeholder/_0"
input: "Placeholder/_1"
attr {
key: "Toutput_types"
value {
list {
type: DT_FLOAT
type: DT_INT64
}
}
}
attr {
key: "output_shapes"
value {
list {
shape {
dim {
size: 28
}
dim {
size: 28
}
}
shape {
}
}
}
}
2022-07-06 17:02:01.742487: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:695] AUTO sharding policy will apply DATA sharding policy as it failed to apply FILE sharding policy because of the following reason: Found an unshardable source dataset: name: "TensorSliceDataset/_2"
op: "TensorSliceDataset"
input: "Placeholder/_0"
input: "Placeholder/_1"
attr {
key: "Toutput_types"
value {
list {
type: DT_FLOAT
type: DT_INT64
}
}
}
attr {
key: "output_shapes"
value {
list {
shape {
dim {
size: 28
}
dim {
size: 28
}
}
shape {
}
}
}
}
2022-07-06 17:02:01.856867: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2022-07-06 17:02:01.868644: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2800080000 Hz
2022-07-06 17:02:01.899574: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2022-07-06 17:02:01.911649: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2800080000 Hz
******** Program Information ********
Real Time (sec) : 18.003706
User Time (sec) : 134.984205
Vector Time (sec) : 0.000004
Inst. Count : 62180981064
V. Inst. Count : 224
V. Element Count : 31996
V. Load Element Count : 8558
FLOP Count : 1
MOPS : 462.370661
MOPS (Real) : 3453.789791
MFLOPS : 0.000000
MFLOPS (Real) : 0.000000
A. V. Length : 142.839286
V. Op. Ratio (%) : 0.000051
L1 Cache Miss (sec) : 0.017768
CPU Port Conf. (sec) : 0.000000
V. Arith. Exec. (sec) : 0.000001
V. Load Exec. (sec) : 0.000001
VLD LLC Hit Element Ratio (%) : 50.899743
FMA Element Count : 0
Power Throttling (sec) : 0.000000
Thermal Throttling (sec) : 0.000000
Max Active Threads : 8
Available CPU Cores : 8
Average CPU Cores Used : 7.497579
Memory Size Used (MB) : 1680.000000
Non Swappable Memory Size Used (MB) : 106.000000
Start Time (date) : Wed Jul 6 17:01:58 2022 JST
End Time (date) : Wed Jul 6 17:02:16 2022 JST
******** Program Information ********
Real Time (sec) : 17.986776
User Time (sec) : 134.768903
Vector Time (sec) : 0.000004
Inst. Count : 62450553602
V. Inst. Count : 224
V. Element Count : 31996
V. Load Element Count : 8558
FLOP Count : 1
MOPS : 465.379193
MOPS (Real) : 3472.028028
MFLOPS : 0.000000
MFLOPS (Real) : 0.000000
A. V. Length : 142.839286
V. Op. Ratio (%) : 0.000051
L1 Cache Miss (sec) : 0.041042
CPU Port Conf. (sec) : 0.000000
V. Arith. Exec. (sec) : 0.000000
V. Load Exec. (sec) : 0.000001
VLD LLC Hit Element Ratio (%) : 51.086703
FMA Element Count : 0
Power Throttling (sec) : 0.000000
Thermal Throttling (sec) : 0.000000
Max Active Threads : 8
Available CPU Cores : 8
Average CPU Cores Used : 7.492666
Memory Size Used (MB) : 1680.000000
Non Swappable Memory Size Used (MB) : 106.000000
Start Time (date) : Wed Jul 6 17:01:58 2022 JST
End Time (date) : Wed Jul 6 17:02:16 2022 JST
Epoch 1/3
1/70 [..............................] - ETA: 48s - loss: 2.3148 - accuracy: 0.0625Epoch 1/3
70/70 [==============================] - 5.211537s 66ms/step - loss: 2.2871 - accuracy: 0.1583
70/70 [==============================] - 5.256045s 66ms/step - loss: 2.2871 - accuracy: 0.1583
Epoch 2/3
1/70 [..............................] - ETA: 4s - loss: 2.2478 - accuracy: 0.3203Epoch 2/3
70/70 [==============================] - 4.550845s 65ms/step - loss: 2.2260 - accuracy: 0.3306
70/70 [==============================] - 4.552100s 65ms/step - loss: 2.2260 - accuracy: 0.3306
Epoch 3/3
1/70 [..............................] - ETA: 4s - loss: 2.1854 - accuracy: 0.3984Epoch 3/3
70/70 [==============================] - 4.607220s 66ms/step - loss: 2.1610 - accuracy: 0.4011
70/70 [==============================] - 4.609234s 66ms/step - loss: 2.1610 - accuracy: 0.4011
#! /bin/bash
#PBS -q sxf
#PBS -l elapstim_req=00:05:00
#PBS --venode 1
#PBS -S /bin/bash
export VE_OMP_NUM_THREADS=8
export VE_NODE_NUMBER=0
cd $PBS_O_WORKDIR
python single.py
2022-07-06 16:38:37.205731: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING:tensorflow:Please add `keras.layers.InputLayer` instead of `keras.Input` to Sequential model. `keras.Input` is intended to be used by Functional model.
2022-07-06 16:38:37.926315: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2022-07-06 16:38:37.939278: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2799965000 Hz
******** Program Information ********
Real Time (sec) : 1.451267
User Time (sec) : 11.377145
Vector Time (sec) : 1.921528
Inst. Count : 7043889741
V. Inst. Count : 700399145
V. Element Count : 135200251055
V. Load Element Count : 9230575902
FLOP Count : 235667626966
MOPS : 22642.952981
MOPS (Real) : 178090.037908
MFLOPS : 20646.505025
MFLOPS (Real) : 162387.691465
A. V. Length : 108.949261
V. Op. Ratio (%) : 97.545620
L1 Cache Miss (sec) : 0.104837
CPU Port Conf. (sec) : 0.284123
V. Arith. Exec. (sec) : 0.875088
V. Load Exec. (sec) : 1.027348
VLD LLC Hit Element Ratio (%) : 59.617985
FMA Element Count : 116912156340
Power Throttling (sec) : 0.000000
Thermal Throttling (sec) : 0.000000
Max Active Threads : 8
Available CPU Cores : 8
Average CPU Cores Used : 7.839457
Memory Size Used (MB) : 1680.000000
Non Swappable Memory Size Used (MB) : 106.000000
Start Time (date) : Wed Jul 6 16:38:37 2022 JST
End Time (date) : Wed Jul 6 16:38:38 2022 JST
Epoch 1/3
70/70 [==============================] - 0.548753s 3ms/step - loss: 2.2705 - accuracy: 0.2292
Epoch 2/3
70/70 [==============================] - 0.186631s 3ms/step - loss: 2.2210 - accuracy: 0.3987
Epoch 3/3
70/70 [==============================] - 0.184858s 3ms/step - loss: 2.1580 - accuracy: 0.5301
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment