Last active
July 6, 2022 08:03
-
-
Save keichi/0f5c00551d91ecc3d5e14f8e4799f0c8 to your computer and use it in GitHub Desktop.
Multi-VE training using TensorFlow-VE
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /bin/bash | |
#PBS -q sx | |
#PBS -l elapstim_req=00:05:00 | |
#PBS --venode 2 | |
#PBS -S /bin/bash | |
export VE_OMP_NUM_THREADS=8 | |
cd $PBS_O_WORKDIR | |
VE_NODE_NUMBER=0 python multi.py & | |
sleep 2 | |
VE_NODE_NUMBER=1 python multi.py |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
WARNING:tensorflow:From multi.py:26: _CollectiveAllReduceStrategyExperimental.__init__ (from tensorflow.python.distribute.collective_all_reduce_strategy) is deprecated and will be removed in a future version. | |
Instructions for updating: | |
use distribute.MultiWorkerMirroredStrategy instead | |
WARNING:tensorflow:From multi.py:26: _CollectiveAllReduceStrategyExperimental.__init__ (from tensorflow.python.distribute.collective_all_reduce_strategy) is deprecated and will be removed in a future version. | |
Instructions for updating: | |
use distribute.MultiWorkerMirroredStrategy instead | |
2022-07-06 17:01:52.143647: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-07-06 17:01:52.143673: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2022-07-06 17:02:00.397267: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:301] Initialize GrpcChannelCache for job worker -> {0 -> localhost:12345, 1 -> localhost:23456} | |
2022-07-06 17:02:00.397413: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:411] Started server with target: grpc://localhost:12345 | |
2022-07-06 17:02:00.431113: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:301] Initialize GrpcChannelCache for job worker -> {0 -> localhost:12345, 1 -> localhost:23456} | |
2022-07-06 17:02:00.431350: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:411] Started server with target: grpc://localhost:23456 | |
WARNING:tensorflow:Please add `keras.layers.InputLayer` instead of `keras.Input` to Sequential model. `keras.Input` is intended to be used by Functional model. | |
WARNING:tensorflow:Please add `keras.layers.InputLayer` instead of `keras.Input` to Sequential model. `keras.Input` is intended to be used by Functional model. | |
2022-07-06 17:02:01.731374: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:695] AUTO sharding policy will apply DATA sharding policy as it failed to apply FILE sharding policy because of the following reason: Found an unshardable source dataset: name: "TensorSliceDataset/_2" | |
op: "TensorSliceDataset" | |
input: "Placeholder/_0" | |
input: "Placeholder/_1" | |
attr { | |
key: "Toutput_types" | |
value { | |
list { | |
type: DT_FLOAT | |
type: DT_INT64 | |
} | |
} | |
} | |
attr { | |
key: "output_shapes" | |
value { | |
list { | |
shape { | |
dim { | |
size: 28 | |
} | |
dim { | |
size: 28 | |
} | |
} | |
shape { | |
} | |
} | |
} | |
} | |
2022-07-06 17:02:01.742487: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:695] AUTO sharding policy will apply DATA sharding policy as it failed to apply FILE sharding policy because of the following reason: Found an unshardable source dataset: name: "TensorSliceDataset/_2" | |
op: "TensorSliceDataset" | |
input: "Placeholder/_0" | |
input: "Placeholder/_1" | |
attr { | |
key: "Toutput_types" | |
value { | |
list { | |
type: DT_FLOAT | |
type: DT_INT64 | |
} | |
} | |
} | |
attr { | |
key: "output_shapes" | |
value { | |
list { | |
shape { | |
dim { | |
size: 28 | |
} | |
dim { | |
size: 28 | |
} | |
} | |
shape { | |
} | |
} | |
} | |
} | |
2022-07-06 17:02:01.856867: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2) | |
2022-07-06 17:02:01.868644: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2800080000 Hz | |
2022-07-06 17:02:01.899574: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2) | |
2022-07-06 17:02:01.911649: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2800080000 Hz | |
******** Program Information ******** | |
Real Time (sec) : 18.003706 | |
User Time (sec) : 134.984205 | |
Vector Time (sec) : 0.000004 | |
Inst. Count : 62180981064 | |
V. Inst. Count : 224 | |
V. Element Count : 31996 | |
V. Load Element Count : 8558 | |
FLOP Count : 1 | |
MOPS : 462.370661 | |
MOPS (Real) : 3453.789791 | |
MFLOPS : 0.000000 | |
MFLOPS (Real) : 0.000000 | |
A. V. Length : 142.839286 | |
V. Op. Ratio (%) : 0.000051 | |
L1 Cache Miss (sec) : 0.017768 | |
CPU Port Conf. (sec) : 0.000000 | |
V. Arith. Exec. (sec) : 0.000001 | |
V. Load Exec. (sec) : 0.000001 | |
VLD LLC Hit Element Ratio (%) : 50.899743 | |
FMA Element Count : 0 | |
Power Throttling (sec) : 0.000000 | |
Thermal Throttling (sec) : 0.000000 | |
Max Active Threads : 8 | |
Available CPU Cores : 8 | |
Average CPU Cores Used : 7.497579 | |
Memory Size Used (MB) : 1680.000000 | |
Non Swappable Memory Size Used (MB) : 106.000000 | |
Start Time (date) : Wed Jul 6 17:01:58 2022 JST | |
End Time (date) : Wed Jul 6 17:02:16 2022 JST | |
******** Program Information ******** | |
Real Time (sec) : 17.986776 | |
User Time (sec) : 134.768903 | |
Vector Time (sec) : 0.000004 | |
Inst. Count : 62450553602 | |
V. Inst. Count : 224 | |
V. Element Count : 31996 | |
V. Load Element Count : 8558 | |
FLOP Count : 1 | |
MOPS : 465.379193 | |
MOPS (Real) : 3472.028028 | |
MFLOPS : 0.000000 | |
MFLOPS (Real) : 0.000000 | |
A. V. Length : 142.839286 | |
V. Op. Ratio (%) : 0.000051 | |
L1 Cache Miss (sec) : 0.041042 | |
CPU Port Conf. (sec) : 0.000000 | |
V. Arith. Exec. (sec) : 0.000000 | |
V. Load Exec. (sec) : 0.000001 | |
VLD LLC Hit Element Ratio (%) : 51.086703 | |
FMA Element Count : 0 | |
Power Throttling (sec) : 0.000000 | |
Thermal Throttling (sec) : 0.000000 | |
Max Active Threads : 8 | |
Available CPU Cores : 8 | |
Average CPU Cores Used : 7.492666 | |
Memory Size Used (MB) : 1680.000000 | |
Non Swappable Memory Size Used (MB) : 106.000000 | |
Start Time (date) : Wed Jul 6 17:01:58 2022 JST | |
End Time (date) : Wed Jul 6 17:02:16 2022 JST |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Epoch 1/3 | |
1/70 [..............................] - ETA: 48s - loss: 2.3148 - accuracy: 0.0625Epoch 1/3 | |
70/70 [==============================] - 5.211537s 66ms/step - loss: 2.2871 - accuracy: 0.1583 | |
70/70 [==============================] - 5.256045s 66ms/step - loss: 2.2871 - accuracy: 0.1583 | |
Epoch 2/3 | |
1/70 [..............................] - ETA: 4s - loss: 2.2478 - accuracy: 0.3203Epoch 2/3 | |
70/70 [==============================] - 4.550845s 65ms/step - loss: 2.2260 - accuracy: 0.3306 | |
70/70 [==============================] - 4.552100s 65ms/step - loss: 2.2260 - accuracy: 0.3306 | |
Epoch 3/3 | |
1/70 [..............................] - ETA: 4s - loss: 2.1854 - accuracy: 0.3984Epoch 3/3 | |
70/70 [==============================] - 4.607220s 66ms/step - loss: 2.1610 - accuracy: 0.4011 | |
70/70 [==============================] - 4.609234s 66ms/step - loss: 2.1610 - accuracy: 0.4011 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /bin/bash | |
#PBS -q sxf | |
#PBS -l elapstim_req=00:05:00 | |
#PBS --venode 1 | |
#PBS -S /bin/bash | |
export VE_OMP_NUM_THREADS=8 | |
export VE_NODE_NUMBER=0 | |
cd $PBS_O_WORKDIR | |
python single.py |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2022-07-06 16:38:37.205731: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA | |
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
WARNING:tensorflow:Please add `keras.layers.InputLayer` instead of `keras.Input` to Sequential model. `keras.Input` is intended to be used by Functional model. | |
2022-07-06 16:38:37.926315: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2) | |
2022-07-06 16:38:37.939278: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2799965000 Hz | |
******** Program Information ******** | |
Real Time (sec) : 1.451267 | |
User Time (sec) : 11.377145 | |
Vector Time (sec) : 1.921528 | |
Inst. Count : 7043889741 | |
V. Inst. Count : 700399145 | |
V. Element Count : 135200251055 | |
V. Load Element Count : 9230575902 | |
FLOP Count : 235667626966 | |
MOPS : 22642.952981 | |
MOPS (Real) : 178090.037908 | |
MFLOPS : 20646.505025 | |
MFLOPS (Real) : 162387.691465 | |
A. V. Length : 108.949261 | |
V. Op. Ratio (%) : 97.545620 | |
L1 Cache Miss (sec) : 0.104837 | |
CPU Port Conf. (sec) : 0.284123 | |
V. Arith. Exec. (sec) : 0.875088 | |
V. Load Exec. (sec) : 1.027348 | |
VLD LLC Hit Element Ratio (%) : 59.617985 | |
FMA Element Count : 116912156340 | |
Power Throttling (sec) : 0.000000 | |
Thermal Throttling (sec) : 0.000000 | |
Max Active Threads : 8 | |
Available CPU Cores : 8 | |
Average CPU Cores Used : 7.839457 | |
Memory Size Used (MB) : 1680.000000 | |
Non Swappable Memory Size Used (MB) : 106.000000 | |
Start Time (date) : Wed Jul 6 16:38:37 2022 JST | |
End Time (date) : Wed Jul 6 16:38:38 2022 JST |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Epoch 1/3 | |
70/70 [==============================] - 0.548753s 3ms/step - loss: 2.2705 - accuracy: 0.2292 | |
Epoch 2/3 | |
70/70 [==============================] - 0.186631s 3ms/step - loss: 2.2210 - accuracy: 0.3987 | |
Epoch 3/3 | |
70/70 [==============================] - 0.184858s 3ms/step - loss: 2.1580 - accuracy: 0.5301 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment