keichi/multi_2ve.sh

## multi_2ve.sh
#! /bin/bash
#PBS -q sx
#PBS -l elapstim_req=00:05:00
#PBS --venode 2
#PBS -S /bin/bash

export VE_OMP_NUM_THREADS=8

cd $PBS_O_WORKDIR

VE_NODE_NUMBER=0 python multi.py &
sleep 2
VE_NODE_NUMBER=1 python multi.py

## multi_2ve.sh.e139529
WARNING:tensorflow:From multi.py:26: _CollectiveAllReduceStrategyExperimental.__init__ (from tensorflow.python.distribute.collective_all_reduce_strategy) is deprecated and will be removed in a future version.
Instructions for updating:
use distribute.MultiWorkerMirroredStrategy instead
WARNING:tensorflow:From multi.py:26: _CollectiveAllReduceStrategyExperimental.__init__ (from tensorflow.python.distribute.collective_all_reduce_strategy) is deprecated and will be removed in a future version.
Instructions for updating:
use distribute.MultiWorkerMirroredStrategy instead
2022-07-06 17:01:52.143647: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-06 17:01:52.143673: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-06 17:02:00.397267: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:301] Initialize GrpcChannelCache for job worker -> {0 -> localhost:12345, 1 -> localhost:23456}
2022-07-06 17:02:00.397413: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:411] Started server with target: grpc://localhost:12345
2022-07-06 17:02:00.431113: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:301] Initialize GrpcChannelCache for job worker -> {0 -> localhost:12345, 1 -> localhost:23456}
2022-07-06 17:02:00.431350: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:411] Started server with target: grpc://localhost:23456
WARNING:tensorflow:Please add `keras.layers.InputLayer` instead of `keras.Input` to Sequential model. `keras.Input` is intended to be used by Functional model.
WARNING:tensorflow:Please add `keras.layers.InputLayer` instead of `keras.Input` to Sequential model. `keras.Input` is intended to be used by Functional model.
2022-07-06 17:02:01.731374: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:695] AUTO sharding policy will apply DATA sharding policy as it failed to apply FILE sharding policy because of the following reason: Found an unshardable source dataset: name: "TensorSliceDataset/_2"
op: "TensorSliceDataset"
input: "Placeholder/_0"
input: "Placeholder/_1"
attr {
  key: "Toutput_types"
  value {
    list {
      type: DT_FLOAT
      type: DT_INT64
    }
  }
}
attr {
  key: "output_shapes"
  value {
    list {
      shape {
        dim {
          size: 28
        }
        dim {
          size: 28
        }
      }
      shape {
      }
    }
  }
}

2022-07-06 17:02:01.742487: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:695] AUTO sharding policy will apply DATA sharding policy as it failed to apply FILE sharding policy because of the following reason: Found an unshardable source dataset: name: "TensorSliceDataset/_2"
op: "TensorSliceDataset"
input: "Placeholder/_0"
input: "Placeholder/_1"
attr {
  key: "Toutput_types"
  value {
    list {
      type: DT_FLOAT
      type: DT_INT64
    }
  }
}
attr {
  key: "output_shapes"
  value {
    list {
      shape {
        dim {
          size: 28
        }
        dim {
          size: 28
        }
      }
      shape {
      }
    }
  }
}

2022-07-06 17:02:01.856867: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2022-07-06 17:02:01.868644: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2800080000 Hz
2022-07-06 17:02:01.899574: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2022-07-06 17:02:01.911649: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2800080000 Hz
            ********  Program  Information  ********
  Real Time (sec)                         :            18.003706
  User Time (sec)                         :           134.984205
  Vector Time (sec)                       :             0.000004
  Inst. Count                             :          62180981064
  V. Inst. Count                          :                  224
  V. Element Count                        :                31996
  V. Load Element Count                   :                 8558
  FLOP Count                              :                    1
  MOPS                                    :           462.370661
  MOPS (Real)                             :          3453.789791
  MFLOPS                                  :             0.000000
  MFLOPS (Real)                           :             0.000000
  A. V. Length                            :           142.839286
  V. Op. Ratio (%)                        :             0.000051
  L1 Cache Miss (sec)                     :             0.017768
  CPU Port Conf. (sec)                    :             0.000000
  V. Arith. Exec. (sec)                   :             0.000001
  V. Load Exec. (sec)                     :             0.000001
  VLD LLC Hit Element Ratio (%)           :            50.899743
  FMA Element Count                       :                    0
  Power Throttling (sec)                  :             0.000000
  Thermal Throttling (sec)                :             0.000000
  Max Active Threads                      :                    8
  Available CPU Cores                     :                    8
  Average CPU Cores Used                  :             7.497579
  Memory Size Used (MB)                   :          1680.000000
  Non Swappable Memory Size Used (MB)     :           106.000000

  Start Time (date)        :        Wed Jul  6 17:01:58 2022 JST
  End   Time (date)        :        Wed Jul  6 17:02:16 2022 JST
            ********  Program  Information  ********
  Real Time (sec)                         :            17.986776
  User Time (sec)                         :           134.768903
  Vector Time (sec)                       :             0.000004
  Inst. Count                             :          62450553602
  V. Inst. Count                          :                  224
  V. Element Count                        :                31996
  V. Load Element Count                   :                 8558
  FLOP Count                              :                    1
  MOPS                                    :           465.379193
  MOPS (Real)                             :          3472.028028
  MFLOPS                                  :             0.000000
  MFLOPS (Real)                           :             0.000000
  A. V. Length                            :           142.839286
  V. Op. Ratio (%)                        :             0.000051
  L1 Cache Miss (sec)                     :             0.041042
  CPU Port Conf. (sec)                    :             0.000000
  V. Arith. Exec. (sec)                   :             0.000000
  V. Load Exec. (sec)                     :             0.000001
  VLD LLC Hit Element Ratio (%)           :            51.086703
  FMA Element Count                       :                    0
  Power Throttling (sec)                  :             0.000000
  Thermal Throttling (sec)                :             0.000000
  Max Active Threads                      :                    8
  Available CPU Cores                     :                    8
  Average CPU Cores Used                  :             7.492666
  Memory Size Used (MB)                   :          1680.000000
  Non Swappable Memory Size Used (MB)     :           106.000000

  Start Time (date)        :        Wed Jul  6 17:01:58 2022 JST
  End   Time (date)        :        Wed Jul  6 17:02:16 2022 JST

## multi_2ve.sh.o139529
Epoch 1/3
 1/70 [..............................] - ETA: 48s - loss: 2.3148 - accuracy: 0.0625Epoch 1/3
70/70 [==============================] - 5.211537s 66ms/step - loss: 2.2871 - accuracy: 0.1583
70/70 [==============================] - 5.256045s 66ms/step - loss: 2.2871 - accuracy: 0.1583
Epoch 2/3
 1/70 [..............................] - ETA: 4s - loss: 2.2478 - accuracy: 0.3203Epoch 2/3
70/70 [==============================] - 4.550845s 65ms/step - loss: 2.2260 - accuracy: 0.3306
70/70 [==============================] - 4.552100s 65ms/step - loss: 2.2260 - accuracy: 0.3306
Epoch 3/3
 1/70 [..............................] - ETA: 4s - loss: 2.1854 - accuracy: 0.3984Epoch 3/3
70/70 [==============================] - 4.607220s 66ms/step - loss: 2.1610 - accuracy: 0.4011
70/70 [==============================] - 4.609234s 66ms/step - loss: 2.1610 - accuracy: 0.4011

## single_ve.sh
#! /bin/bash
#PBS -q sxf
#PBS -l elapstim_req=00:05:00
#PBS --venode 1
#PBS -S /bin/bash

export VE_OMP_NUM_THREADS=8
export VE_NODE_NUMBER=0

cd $PBS_O_WORKDIR

python single.py

## single_ve.sh.e139530
2022-07-06 16:38:37.205731: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING:tensorflow:Please add `keras.layers.InputLayer` instead of `keras.Input` to Sequential model. `keras.Input` is intended to be used by Functional model.
2022-07-06 16:38:37.926315: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2022-07-06 16:38:37.939278: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2799965000 Hz
            ********  Program  Information  ********
  Real Time (sec)                         :             1.451267
  User Time (sec)                         :            11.377145
  Vector Time (sec)                       :             1.921528
  Inst. Count                             :           7043889741
  V. Inst. Count                          :            700399145
  V. Element Count                        :         135200251055
  V. Load Element Count                   :           9230575902
  FLOP Count                              :         235667626966
  MOPS                                    :         22642.952981
  MOPS (Real)                             :        178090.037908
  MFLOPS                                  :         20646.505025
  MFLOPS (Real)                           :        162387.691465
  A. V. Length                            :           108.949261
  V. Op. Ratio (%)                        :            97.545620
  L1 Cache Miss (sec)                     :             0.104837
  CPU Port Conf. (sec)                    :             0.284123
  V. Arith. Exec. (sec)                   :             0.875088
  V. Load Exec. (sec)                     :             1.027348
  VLD LLC Hit Element Ratio (%)           :            59.617985
  FMA Element Count                       :         116912156340
  Power Throttling (sec)                  :             0.000000
  Thermal Throttling (sec)                :             0.000000
  Max Active Threads                      :                    8
  Available CPU Cores                     :                    8
  Average CPU Cores Used                  :             7.839457
  Memory Size Used (MB)                   :          1680.000000
  Non Swappable Memory Size Used (MB)     :           106.000000

  Start Time (date)        :        Wed Jul  6 16:38:37 2022 JST
  End   Time (date)        :        Wed Jul  6 16:38:38 2022 JST

## single_ve.sh.o139530
Epoch 1/3
70/70 [==============================] - 0.548753s 3ms/step - loss: 2.2705 - accuracy: 0.2292
Epoch 2/3
70/70 [==============================] - 0.186631s 3ms/step - loss: 2.2210 - accuracy: 0.3987
Epoch 3/3
70/70 [==============================] - 0.184858s 3ms/step - loss: 2.1580 - accuracy: 0.5301
	#! /bin/bash
	#PBS -q sx
	#PBS -l elapstim_req=00:05:00
	#PBS --venode 2
	#PBS -S /bin/bash

	export VE_OMP_NUM_THREADS=8

	cd $PBS_O_WORKDIR

	VE_NODE_NUMBER=0 python multi.py &
	sleep 2
	VE_NODE_NUMBER=1 python multi.py
	WARNING:tensorflow:From multi.py:26: _CollectiveAllReduceStrategyExperimental.__init__ (from tensorflow.python.distribute.collective_all_reduce_strategy) is deprecated and will be removed in a future version.
	Instructions for updating:
	use distribute.MultiWorkerMirroredStrategy instead
	WARNING:tensorflow:From multi.py:26: _CollectiveAllReduceStrategyExperimental.__init__ (from tensorflow.python.distribute.collective_all_reduce_strategy) is deprecated and will be removed in a future version.
	Instructions for updating:
	use distribute.MultiWorkerMirroredStrategy instead
	2022-07-06 17:01:52.143647: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
	To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
	2022-07-06 17:01:52.143673: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
	To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
	2022-07-06 17:02:00.397267: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:301] Initialize GrpcChannelCache for job worker -> {0 -> localhost:12345, 1 -> localhost:23456}
	2022-07-06 17:02:00.397413: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:411] Started server with target: grpc://localhost:12345
	2022-07-06 17:02:00.431113: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:301] Initialize GrpcChannelCache for job worker -> {0 -> localhost:12345, 1 -> localhost:23456}
	2022-07-06 17:02:00.431350: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:411] Started server with target: grpc://localhost:23456
	WARNING:tensorflow:Please add `keras.layers.InputLayer` instead of `keras.Input` to Sequential model. `keras.Input` is intended to be used by Functional model.
	WARNING:tensorflow:Please add `keras.layers.InputLayer` instead of `keras.Input` to Sequential model. `keras.Input` is intended to be used by Functional model.
	2022-07-06 17:02:01.731374: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:695] AUTO sharding policy will apply DATA sharding policy as it failed to apply FILE sharding policy because of the following reason: Found an unshardable source dataset: name: "TensorSliceDataset/_2"
	op: "TensorSliceDataset"
	input: "Placeholder/_0"
	input: "Placeholder/_1"
	attr {
	key: "Toutput_types"
	value {
	list {
	type: DT_FLOAT
	type: DT_INT64
	}
	}
	}
	attr {
	key: "output_shapes"
	value {
	list {
	shape {
	dim {
	size: 28
	}
	dim {
	size: 28
	}
	}
	shape {
	}
	}
	}
	}

	2022-07-06 17:02:01.742487: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:695] AUTO sharding policy will apply DATA sharding policy as it failed to apply FILE sharding policy because of the following reason: Found an unshardable source dataset: name: "TensorSliceDataset/_2"
	op: "TensorSliceDataset"
	input: "Placeholder/_0"
	input: "Placeholder/_1"
	attr {
	key: "Toutput_types"
	value {
	list {
	type: DT_FLOAT
	type: DT_INT64
	}
	}
	}
	attr {
	key: "output_shapes"
	value {
	list {
	shape {
	dim {
	size: 28
	}
	dim {
	size: 28
	}
	}
	shape {
	}
	}
	}
	}

	2022-07-06 17:02:01.856867: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
	2022-07-06 17:02:01.868644: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2800080000 Hz
	2022-07-06 17:02:01.899574: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
	2022-07-06 17:02:01.911649: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2800080000 Hz
	****** Program Information ******
	Real Time (sec) : 18.003706
	User Time (sec) : 134.984205
	Vector Time (sec) : 0.000004
	Inst. Count : 62180981064
	V. Inst. Count : 224
	V. Element Count : 31996
	V. Load Element Count : 8558
	FLOP Count : 1
	MOPS : 462.370661
	MOPS (Real) : 3453.789791
	MFLOPS : 0.000000
	MFLOPS (Real) : 0.000000
	A. V. Length : 142.839286
	V. Op. Ratio (%) : 0.000051
	L1 Cache Miss (sec) : 0.017768
	CPU Port Conf. (sec) : 0.000000
	V. Arith. Exec. (sec) : 0.000001
	V. Load Exec. (sec) : 0.000001
	VLD LLC Hit Element Ratio (%) : 50.899743
	FMA Element Count : 0
	Power Throttling (sec) : 0.000000
	Thermal Throttling (sec) : 0.000000
	Max Active Threads : 8
	Available CPU Cores : 8
	Average CPU Cores Used : 7.497579
	Memory Size Used (MB) : 1680.000000
	Non Swappable Memory Size Used (MB) : 106.000000

	Start Time (date) : Wed Jul 6 17:01:58 2022 JST
	End Time (date) : Wed Jul 6 17:02:16 2022 JST
	****** Program Information ******
	Real Time (sec) : 17.986776
	User Time (sec) : 134.768903
	Vector Time (sec) : 0.000004
	Inst. Count : 62450553602
	V. Inst. Count : 224
	V. Element Count : 31996
	V. Load Element Count : 8558
	FLOP Count : 1
	MOPS : 465.379193
	MOPS (Real) : 3472.028028
	MFLOPS : 0.000000
	MFLOPS (Real) : 0.000000
	A. V. Length : 142.839286
	V. Op. Ratio (%) : 0.000051
	L1 Cache Miss (sec) : 0.041042
	CPU Port Conf. (sec) : 0.000000
	V. Arith. Exec. (sec) : 0.000000
	V. Load Exec. (sec) : 0.000001
	VLD LLC Hit Element Ratio (%) : 51.086703
	FMA Element Count : 0
	Power Throttling (sec) : 0.000000
	Thermal Throttling (sec) : 0.000000
	Max Active Threads : 8
	Available CPU Cores : 8
	Average CPU Cores Used : 7.492666
	Memory Size Used (MB) : 1680.000000
	Non Swappable Memory Size Used (MB) : 106.000000

	Start Time (date) : Wed Jul 6 17:01:58 2022 JST
	End Time (date) : Wed Jul 6 17:02:16 2022 JST
	Epoch 1/3
	1/70 [..............................] - ETA: 48s - loss: 2.3148 - accuracy: 0.0625Epoch 1/3
	70/70 [==============================] - 5.211537s 66ms/step - loss: 2.2871 - accuracy: 0.1583
	70/70 [==============================] - 5.256045s 66ms/step - loss: 2.2871 - accuracy: 0.1583
	Epoch 2/3
	1/70 [..............................] - ETA: 4s - loss: 2.2478 - accuracy: 0.3203Epoch 2/3
	70/70 [==============================] - 4.550845s 65ms/step - loss: 2.2260 - accuracy: 0.3306
	70/70 [==============================] - 4.552100s 65ms/step - loss: 2.2260 - accuracy: 0.3306
	Epoch 3/3
	1/70 [..............................] - ETA: 4s - loss: 2.1854 - accuracy: 0.3984Epoch 3/3
	70/70 [==============================] - 4.607220s 66ms/step - loss: 2.1610 - accuracy: 0.4011
	70/70 [==============================] - 4.609234s 66ms/step - loss: 2.1610 - accuracy: 0.4011
	#! /bin/bash
	#PBS -q sxf
	#PBS -l elapstim_req=00:05:00
	#PBS --venode 1
	#PBS -S /bin/bash

	export VE_OMP_NUM_THREADS=8
	export VE_NODE_NUMBER=0

	cd $PBS_O_WORKDIR

	python single.py
	2022-07-06 16:38:37.205731: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
	To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
	WARNING:tensorflow:Please add `keras.layers.InputLayer` instead of `keras.Input` to Sequential model. `keras.Input` is intended to be used by Functional model.
	2022-07-06 16:38:37.926315: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
	2022-07-06 16:38:37.939278: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2799965000 Hz
	****** Program Information ******
	Real Time (sec) : 1.451267
	User Time (sec) : 11.377145
	Vector Time (sec) : 1.921528
	Inst. Count : 7043889741
	V. Inst. Count : 700399145
	V. Element Count : 135200251055
	V. Load Element Count : 9230575902
	FLOP Count : 235667626966
	MOPS : 22642.952981
	MOPS (Real) : 178090.037908
	MFLOPS : 20646.505025
	MFLOPS (Real) : 162387.691465
	A. V. Length : 108.949261
	V. Op. Ratio (%) : 97.545620
	L1 Cache Miss (sec) : 0.104837
	CPU Port Conf. (sec) : 0.284123
	V. Arith. Exec. (sec) : 0.875088
	V. Load Exec. (sec) : 1.027348
	VLD LLC Hit Element Ratio (%) : 59.617985
	FMA Element Count : 116912156340
	Power Throttling (sec) : 0.000000
	Thermal Throttling (sec) : 0.000000
	Max Active Threads : 8
	Available CPU Cores : 8
	Average CPU Cores Used : 7.839457
	Memory Size Used (MB) : 1680.000000
	Non Swappable Memory Size Used (MB) : 106.000000

	Start Time (date) : Wed Jul 6 16:38:37 2022 JST
	End Time (date) : Wed Jul 6 16:38:38 2022 JST