This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def train_fn(): | |
# Make sure pyarrow is referenced before anything else to avoid segfault due to conflict | |
# with TensorFlow libraries. Use `pa` package reference to ensure it's loaded before | |
# functions like `deserialize_model` which are implemented at the top level. | |
# See https://jira.apache.org/jira/browse/ARROW-3346 | |
pa | |
# import atexit | |
import horovod.tensorflow.keras as hvd | |
import os |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Adapted from: https://www.tensorflow.org/beta/tutorials/distribute/multi_worker_with_keras | |
from __future__ import absolute_import, division, print_function, unicode_literals | |
def main_fun(args, ctx): | |
import tensorflow as tf | |
import numpy as np | |
import imagecodecs |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
................ | |
................ | |
def main_fun(args, ctx): | |
batch_size=32 | |
print(len(trainx)) # -----> 672 | |
# 672/32 = 21 | |
#Create distribute strategy |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Spark Executor Command: "/usr/lib/jvm/java-1.8.0-openjdk-amd64/bin/java" "-cp" "/home/node/spark/conf/:/home/node/spark/jars/*:/home/node/hadoop3.1.1/etc/hadoop/" "-Xmx9216M" "-Dspark.driver.port=38237" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@node:38237" "--executor-id" "1" "--hostname" "172.16.44.121" "--cores" "4" "--app-id" "app-20200811034057-0007" "--worker-url" "spark://Worker@172.16.44.121:40515" | |
======================================== | |
2020-08-11 03:40:58,546 INFO executor.CoarseGrainedExecutorBackend: Started daemon with process name: 48258@node | |
2020-08-11 03:40:58,550 INFO util.SignalUtils: Registered signal handler for TERM | |
2020-08-11 03:40:58,551 INFO util.SignalUtils: Registered signal handler for HUP | |
2020-08-11 03:40:58,551 INFO util.SignalUtils: Registered signal handler for INT | |
2020-08-11 03:40:59,438 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def unet(shape = (128,128,4)): | |
# Left side of the U-Net | |
inputs = Input(shape) | |
conv1 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'random_normal')(inputs) | |
conv1 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'random_normal')(conv1) | |
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1) | |
conv2 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'random_normal')(pool1) | |
conv2 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'random_normal')(conv2) | |
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--------------------------------------------------------------------------- | |
NotImplementedError Traceback (most recent call last) | |
<ipython-input-26-55bddb4c82a6> in <module> | |
1 # dist_model.summary() | |
----> 2 history = dist_model.fit(trainx, trainy_hot, epochs=1, validation_data = (testx, testy_hot),batch_size=64, verbose=1) | |
~\Anaconda3\envs\open_cv\lib\site-packages\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs) | |
1211 else: | |
1212 fit_inputs = x + y + sample_weights | |
-> 1213 self._make_train_function() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Spark Executor Command: "/usr/lib/jvm/java-1.8.0-openjdk-amd64/bin/java" "-cp" "/home/orwa/spark/conf/:/home/orwa/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=37501" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@master:37501" "--executor-id" "0" "--hostname" "192.168.198.131" "--cores" "2" "--app-id" "app-20200630012803-0001" "--worker-url" "spark://Worker@192.168.198.131:37685" | |
======================================== | |
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties | |
20/06/30 01:28:39 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 8519@orwa-virtual-machine | |
20/06/30 01:28:39 INFO SignalUtils: Registered signal handler for TERM | |
20/06/30 01:28:39 INFO SignalUtils: Registered signal handler for HUP | |
20/06/30 01:28:39 INFO SignalUtils: Registered signal handler for INT | |
20/06/30 01:28:39 WARN Utils: Your hostname, orwa-virtual-machine resolves to a loopback address: 127.0.1.1; using 192.168.198.131 inste |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Spark Executor Command: "/usr/lib/jvm/java-1.8.0-openjdk-amd64/bin/java" "-cp" "/home/orwa/spark/conf/:/home/orwa/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=36393" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@192.168.198.131:36393" "--executor-id" "0" "--hostname" "192.168.198.131" "--cores" "2" "--app-id" "app-20200624132821-0010" "--worker-url" "spark://Worker@192.168.198.131:36489" | |
======================================== | |
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties | |
20/06/24 13:28:22 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 6648@orwa-virtual-machine | |
20/06/24 13:28:22 INFO SignalUtils: Registered signal handler for TERM | |
20/06/24 13:28:22 INFO SignalUtils: Registered signal handler for HUP | |
20/06/24 13:28:22 INFO SignalUtils: Registered signal handler for INT | |
20/06/24 13:28:22 WARN Utils: Your hostname, orwa-virtual-machine resolves to a loopback address: 127.0.1.1; using 192.168.198. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
20/06/19 13:41:30 WARN Utils: Your hostname, orwa-virtual-machine resolves to a loopback address: 127.0.1.1; using 192.168.198.131 instead (on interface ens33) | |
20/06/19 13:41:30 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address | |
20/06/19 13:41:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable | |
2020-06-19 13:41:33.862495: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory | |
2020-06-19 13:41:33.862706: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory | |
2020-06-19 13:41:33.862730: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with Ten |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
20/06/19 13:30:06 WARN Utils: Your hostname, orwa-virtual-machine resolves to a loopback address: 127.0.1.1; using 192.168.198.131 instead (on interface ens33) | |
20/06/19 13:30:06 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address | |
20/06/19 13:30:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable | |
2020-06-19 13:30:10.735978: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory | |
2020-06-19 13:30:10.736517: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory | |
2020-06-19 13:30:10.736634: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with Te |