Skip to content

Instantly share code, notes, and snippets.

@keisukefukuda
Created April 12, 2019 09:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save keisukefukuda/4e5a89b039f4e4083769ba8303bbf3b2 to your computer and use it in GitHub Desktop.
Save keisukefukuda/4e5a89b039f4e4083769ba8303bbf3b2 to your computer and use it in GitHub Desktop.
Fri Apr 12 07:58:03 UTC 2019
================================================================================
In process.sh:
CUDA_VERSION = 9.0
PYTHON_VERSION = 2.7.15
OPENMPI_VERSION = 2.1.3
Chainer = https://github.com/chainer/chainer@master
CuPy = https://github.com/cupy/cupy@master
OMPI_COMM_WORLD_LOCAL_RANK = 1
OMPI_COMM_WORLD_RANK = 3
OMPI_COMM_WORLD_NODE_RANK = 1
OMPI_COMM_WORLD_SIZE = 4
OMPI_COMM_WORLD_LOCAL_SIZE = 2
host = kokona-job000679-worker-1
/usr/local/cuda-9.0/bin:/usr/local/pyenv/shims:/usr/local/cuda-9.2/bin:/bin:/usr/bin:/usr/local/pyenv/bin:/usr/local/openmpi-2.1.3/bin:/usr/local/openmpi-2.1.3/bin:/usr/local/pyenv/shims:/usr/local/cuda-9.2/bin:/bin:/usr/bin:/usr/local/pyenv/bin:/usr/local/pyenv/shims:/usr/local/cuda-9.2/bin:/bin:/usr/bin:/usr/local/pyenv/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
/usr/local/cuda-9.2/bin/nvcc
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:07:04_CDT_2018
Cuda compilation tools, release 9.2, V9.2.148
nvidia-smi
Fri Apr 12 07:58:04 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79 Driver Version: 410.79 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... On | 00000000:3F:00.0 Off | 0 |
| N/A 33C P0 26W / 250W | 11MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-PCIE... On | 00000000:40:00.0 Off | 0 |
| N/A 34C P0 27W / 250W | 11MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Fri Apr 12 07:58:04 UTC 2019
================================================================================
Setup Python, MPI, CuDNN, etc.
================================================================================
Python 2.7.15
Fri Apr 12 07:58:04 UTC 2019
================================================================================
Install Chainer / CuPy
================================================================================
+ export CPATH=/cudnnenv/.cudnn/active/cuda/include:/cudnnenv/active/cuda/include:
+ CPATH=/cudnnenv/.cudnn/active/cuda/include:/cudnnenv/active/cuda/include:
+ export LD_LIBRARY_PATH=/cudnnenv/.cudnn/active/cuda/lib64:/cudnnenv/active/cuda/lib64:/usr/local/openmpi-2.1.3/lib:/cudnnenv/active/cuda/lib64:/cudnnenv/active/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
+ LD_LIBRARY_PATH=/cudnnenv/.cudnn/active/cuda/lib64:/cudnnenv/active/cuda/lib64:/usr/local/openmpi-2.1.3/lib:/cudnnenv/active/cuda/lib64:/cudnnenv/active/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
+ export LIBRARY_PATH=/cudnnenv/.cudnn/active/cuda/lib64:/cudnnenv/active/cuda/lib64:/usr/local/cuda/lib64/stubs
+ LIBRARY_PATH=/cudnnenv/.cudnn/active/cuda/lib64:/cudnnenv/active/cuda/lib64:/usr/local/cuda/lib64/stubs
Waiting for Chainer repository to be set up...
+ export CPATH=/usr/local/nccl/2.4.2-1/include:/cudnnenv/.cudnn/active/cuda/include:/cudnnenv/active/cuda/include:
+ CPATH=/usr/local/nccl/2.4.2-1/include:/cudnnenv/.cudnn/active/cuda/include:/cudnnenv/active/cuda/include:
+ export LD_LIBRARY_PATH=/usr/local/nccl/2.4.2-1/lib:/cudnnenv/.cudnn/active/cuda/lib64:/cudnnenv/active/cuda/lib64:/usr/local/openmpi-2.1.3/lib:/cudnnenv/active/cuda/lib64:/cudnnenv/active/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
+ LD_LIBRARY_PATH=/usr/local/nccl/2.4.2-1/lib:/cudnnenv/.cudnn/active/cuda/lib64:/cudnnenv/active/cuda/lib64:/usr/local/openmpi-2.1.3/lib:/cudnnenv/active/cuda/lib64:/cudnnenv/active/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
+ export LIBRARY_PATH=/usr/local/nccl/2.4.2-1/lib:/cudnnenv/.cudnn/active/cuda/lib64:/cudnnenv/active/cuda/lib64:/usr/local/cuda/lib64/stubs
+ LIBRARY_PATH=/usr/local/nccl/2.4.2-1/lib:/cudnnenv/.cudnn/active/cuda/lib64:/cudnnenv/active/cuda/lib64:/usr/local/cuda/lib64/stubs
+ '[' 1 -eq 0 ']'
+ echo 'Waiting for Chainer repository to be set up...'
+ sleep 30
+ true
+ '[' -f /tmp/chainer_install_done ']'
Still waiting for /tmp/chainer_install_done to be created
+ echo 'Still waiting for /tmp/chainer_install_done to be created'
+ sleep 5
+ true
+ '[' -f /tmp/chainer_install_done ']'
Still waiting for /tmp/chainer_install_done to be created
+ echo 'Still waiting for /tmp/chainer_install_done to be created'
+ sleep 5
+ true
+ '[' -f /tmp/chainer_install_done ']'
Still waiting for /tmp/chainer_install_done to be created
+ echo 'Still waiting for /tmp/chainer_install_done to be created'
+ sleep 5
+ true
+ '[' -f /tmp/chainer_install_done ']'
Found /tmp/chainer_install_done file
+ echo 'Found /tmp/chainer_install_done file'
+ break
+ set -e
+ date
Fri Apr 12 07:58:49 UTC 2019
+ cat
================================================================================
Chainer software versions:
================================================================================
+ echo 'Chainer versions:'
Chainer versions:
+ python -c 'import chainer; print('\''Chainer '\'', chainer.__version__)'
('Chainer ', '6.0.0rc1')
+ python -c 'import cupy; print('\''Cupy '\'', cupy.__version__)'
('Cupy ', '6.0.0rc1')
+ python -c 'import chainer; chainer.print_runtime_info()'
Platform: Linux-4.4.0-116-generic-x86_64-with-debian-stretch-sid
Chainer: 6.0.0rc1
NumPy: 1.16.2
CuPy:
CuPy Version : 6.0.0rc1
CUDA Root : /usr/local/cuda-9.2
CUDA Build Version : 9020
CUDA Driver Version : 10000
CUDA Runtime Version : 9020
cuDNN Build Version : 7500
cuDNN Version : 7500
NCCL Build Version : 2402
NCCL Runtime Version : 2402
iDeep: Not Available
+ date
Fri Apr 12 07:58:54 UTC 2019
+ echo =====================================================================
=====================================================================
+ echo ' ibstat'
ibstat
+ echo =====================================================================
=====================================================================
+ ibstat
CA 'mlx5_0'
CA type: MT4115
Number of ports: 1
Firmware version: 12.23.1000
Hardware version: 0
Node GUID: 0x506b4b03001c9e1a
System image GUID: 0x506b4b03001c9e1a
Port 1:
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 351
LMC: 0
SM lid: 1
Capability mask: 0x2651e848
Port GUID: 0x506b4b03001c9e1a
Link layer: InfiniBand
CA 'mlx5_1'
CA type: MT4115
Number of ports: 1
Firmware version: 12.23.1000
Hardware version: 0
Node GUID: 0x506b4b03001c9dce
System image GUID: 0x506b4b03001c9dce
Port 1:
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 334
LMC: 0
SM lid: 1
Capability mask: 0x2651e848
Port GUID: 0x506b4b03001c9dce
Link layer: InfiniBand
+ date
Fri Apr 12 07:58:54 UTC 2019
+ echo =====================================================================
+ echo ' nvidia-smi'
=====================================================================
nvidia-smi
+ echo =====================================================================
=====================================================================
+ nvidia-smi
Fri Apr 12 07:58:54 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79 Driver Version: 410.79 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... On | 00000000:3F:00.0 Off | 0 |
| N/A 33C P0 26W / 250W | 11MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-PCIE... On | 00000000:40:00.0 Off | 0 |
| N/A 34C P0 27W / 250W | 11MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
+ nvidia-smi -x -q
+ grep uuid
<uuid>GPU-f2688668-b481-6666-6305-fc9f28487066</uuid>
<uuid>GPU-bfc28117-c095-cf68-cf17-14c444e92540</uuid>
+ date
Fri Apr 12 07:58:57 UTC 2019
+ echo =====================================================================
=====================================================================
+ echo ' chainermn-micro-benchmark'
chainermn-micro-benchmark
+ echo =====================================================================
=====================================================================
+ date
Fri Apr 12 07:58:57 UTC 2019
+ cat
================================================================================
Main task
================================================================================
+ cd /chainer
+ case $KOKONA_TARGET in
+ TIMEOUT=7200
+ timeout -s KILL -k 30 7200 python -m pytest --color=yes --full-trace --duration=10 -x --capture=no -s -v -m 'not slow' tests/chainermn_tests
============================= test session starts ==============================
platform linux2 -- Python 2.7.15, pytest-4.4.0, py-1.8.0, pluggy-0.9.0 -- /usr/local/pyenv/versions/2.7.15/bin/python
cachedir: .pytest_cache
rootdir: /chainer, inifile: setup.cfg
collecting ... mbind: Operation not permitted
mbind: Operation not permitted
mbind: Operation not permitted
mbind: Operation not permitted
mbind: Operation not permitted
mbind: Operation not permitted
mbind: Operation not permitted
mbind: Operation not permitted

collecting 0 items 
collecting 174 items 
collected 266 items / 4 deselected / 262 selected 
tests/chainermn_tests/communicator_tests/test_communication_utility.py::TestCommunicationUtility::test_chunked_bcast_objs PASSED
tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_cpu[param0] PASSED
tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_gpu[param0] PASSED
tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_gpu[param1] PASSED
tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_gpu[param2] PASSED
tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_gpu[param3] PASSED
tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_gpu[param4] PASSED
tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_gpu[param5] PASSED
tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_gpu[param6] FAILED
=================================== FAILURES ===================================
________________________ test_communicator_gpu[param6] _________________________
cls = <class '_pytest.runner.CallInfo'>
func = <function <lambda> at 0x7efd48829cf8>, when = 'call'
reraise = (<class '_pytest.outcomes.Exit'>, <type 'exceptions.KeyboardInterrupt'>)
 @classmethod
 def from_call(cls, func, when, reraise=None):
 #: context of invocation: one of "setup", "call",
 #: "teardown", "memocollect"
 start = time()
 excinfo = None
 try:
> result = func()
/usr/local/pyenv/versions/2.7.15/lib/python2.7/site-packages/_pytest/runner.py:226:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> lambda: ihook(item=item, **kwds), when=when, reraise=reraise
 )
/usr/local/pyenv/versions/2.7.15/lib/python2.7/site-packages/_pytest/runner.py:198:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <_HookCaller 'pytest_runtest_call'>, args = ()
kwargs = {'item': <Function test_communicator_gpu[param6]>}, notincall = set([])
 def __call__(self, *args, **kwargs):
 if args:
 raise TypeError("hook calling supports only keyword arguments")
 assert not self.is_historic()
 if self.spec and self.spec.argnames:
 notincall = (
 set(self.spec.argnames) - set(["__multicall__"]) - set(kwargs.keys())
 )
 if notincall:
 warnings.warn(
 "Argument(s) {} which are declared in the hookspec "
 "can not be found in this hook call".format(tuple(notincall)),
 stacklevel=2,
 )
> return self._hookexec(self, self.get_hookimpls(), kwargs)
/usr/local/pyenv/versions/2.7.15/lib/python2.7/site-packages/pluggy/hooks.py:289:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <_pytest.config.PytestPluginManager object at 0x7efd5d434850>
hook = <_HookCaller 'pytest_runtest_call'>
methods = [<HookImpl plugin_name='runner', plugin=<module '_pytest.runner' from '/usr/local/pyenv/versions/2.7.15/lib/python2.7/...u[param6]>>>, <HookImpl plugin_name='logging-plugin', plugin=<_pytest.logging.LoggingPlugin object at 0x7efd5d73aa90>>]
kwargs = {'item': <Function test_communicator_gpu[param6]>}
 def _hookexec(self, hook, methods, kwargs):
 # called from all hookcaller instances.
 # enable_tracing will set its own wrapping function at self._inner_hookexec
> return self._inner_hookexec(hook, methods, kwargs)
/usr/local/pyenv/versions/2.7.15/lib/python2.7/site-packages/pluggy/manager.py:68:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
hook = <_HookCaller 'pytest_runtest_call'>
methods = [<HookImpl plugin_name='runner', plugin=<module '_pytest.runner' from '/usr/local/pyenv/versions/2.7.15/lib/python2.7/...u[param6]>>>, <HookImpl plugin_name='logging-plugin', plugin=<_pytest.logging.LoggingPlugin object at 0x7efd5d73aa90>>]
kwargs = {'item': <Function test_communicator_gpu[param6]>}
 self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
 methods,
 kwargs,
> firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
 )
/usr/local/pyenv/versions/2.7.15/lib/python2.7/site-packages/pluggy/manager.py:62:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
item = <Function test_communicator_gpu[param6]>
 def pytest_runtest_call(item):
 _update_current_test_var(item, "call")
 sys.last_type, sys.last_value, sys.last_traceback = (None, None, None)
 try:
> item.runtest()
/usr/local/pyenv/versions/2.7.15/lib/python2.7/site-packages/_pytest/runner.py:123:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <Function test_communicator_gpu[param6]>
 def runtest(self):
 """ execute the underlying test function. """
> self.ihook.pytest_pyfunc_call(pyfuncitem=self)
/usr/local/pyenv/versions/2.7.15/lib/python2.7/site-packages/_pytest/python.py:1464:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <_HookCaller 'pytest_pyfunc_call'>, args = ()
kwargs = {'pyfuncitem': <Function test_communicator_gpu[param6]>}
notincall = set([])
 def __call__(self, *args, **kwargs):
 if args:
 raise TypeError("hook calling supports only keyword arguments")
 assert not self.is_historic()
 if self.spec and self.spec.argnames:
 notincall = (
 set(self.spec.argnames) - set(["__multicall__"]) - set(kwargs.keys())
 )
 if notincall:
 warnings.warn(
 "Argument(s) {} which are declared in the hookspec "
 "can not be found in this hook call".format(tuple(notincall)),
 stacklevel=2,
 )
> return self._hookexec(self, self.get_hookimpls(), kwargs)
/usr/local/pyenv/versions/2.7.15/lib/python2.7/site-packages/pluggy/hooks.py:289:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <_pytest.config.PytestPluginManager object at 0x7efd5d434850>
hook = <_HookCaller 'pytest_pyfunc_call'>
methods = [<HookImpl plugin_name='python', plugin=<module '_pytest.python' from '/usr/local/pyenv/versions/2.7.15/lib/python2.7/...=<module '_pytest.skipping' from '/usr/local/pyenv/versions/2.7.15/lib/python2.7/site-packages/_pytest/skipping.pyc'>>]
kwargs = {'pyfuncitem': <Function test_communicator_gpu[param6]>}
 def _hookexec(self, hook, methods, kwargs):
 # called from all hookcaller instances.
 # enable_tracing will set its own wrapping function at self._inner_hookexec
> return self._inner_hookexec(hook, methods, kwargs)
/usr/local/pyenv/versions/2.7.15/lib/python2.7/site-packages/pluggy/manager.py:68:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
hook = <_HookCaller 'pytest_pyfunc_call'>
methods = [<HookImpl plugin_name='python', plugin=<module '_pytest.python' from '/usr/local/pyenv/versions/2.7.15/lib/python2.7/...=<module '_pytest.skipping' from '/usr/local/pyenv/versions/2.7.15/lib/python2.7/site-packages/_pytest/skipping.pyc'>>]
kwargs = {'pyfuncitem': <Function test_communicator_gpu[param6]>}
 self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
 methods,
 kwargs,
> firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
 )
/usr/local/pyenv/versions/2.7.15/lib/python2.7/site-packages/pluggy/manager.py:62:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pyfuncitem = <Function test_communicator_gpu[param6]>
 @hookimpl(trylast=True)
 def pytest_pyfunc_call(pyfuncitem):
 testfunction = pyfuncitem.obj
 iscoroutinefunction = getattr(inspect, "iscoroutinefunction", None)
 if iscoroutinefunction is not None and iscoroutinefunction(testfunction):
 msg = "Coroutine functions are not natively supported and have been skipped.\n"
 msg += "You need to install a suitable plugin for your async framework, for example:\n"
 msg += " - pytest-asyncio\n"
 msg += " - pytest-trio\n"
 msg += " - pytest-tornasync"
 warnings.warn(PytestWarning(msg.format(pyfuncitem.nodeid)))
 skip(msg="coroutine function and no async plugin installed (see warnings)")
 funcargs = pyfuncitem.funcargs
 testargs = {arg: funcargs[arg] for arg in pyfuncitem._fixtureinfo.argnames}
> testfunction(**testargs)
/usr/local/pyenv/versions/2.7.15/lib/python2.7/site-packages/_pytest/python.py:178:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
param = {'allreduce_grad_dtype': None,
'batched_copy': False,
'communicator_class': ...one,
'gpu': False,
'model_dtype': None,
'multi_node': True,
'nccl1': False}
 @pytest.mark.parametrize('param', gpu_params)
 @chainer.testing.attr.gpu
 def test_communicator_gpu(param):
 check_send_recv(param, True)
> check_collective_communication(param, True)
tests/chainermn_tests/communicator_tests/test_communicator.py:476:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
param = {'allreduce_grad_dtype': None,
'batched_copy': False,
'communicator_class': ...one,
'gpu': False,
'model_dtype': None,
'multi_node': True,
'nccl1': False}
use_gpu = True
 def check_collective_communication(param, use_gpu):
 communicator = create_communicator(param, use_gpu)
 mpi_comm.barrier()
 
 model = ExampleModel(param.model_dtype)
 if use_gpu:
 model.to_gpu()
 check_bcast_data(communicator, model)
> check_allreduce_grad(communicator, model)
tests/chainermn_tests/communicator_tests/test_communicator.py:444:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
communicator = <chainermn.communicators.two_dimensional_communicator.TwoDimensionalCommunicator object at 0x7efd4882dc50>
model = <test_communicator.ExampleModel object at 0x7efd4882dbd0>
 def check_allreduce_grad(communicator, model):
 # We need to repeat twice for regressions on lazy initialization of
 # sub communicators.
 
 for _ in range(2):
 model.a.W.grad[:] = communicator.rank
 model.b.W.grad[:] = communicator.rank + 1
 model.c.b.grad[:] = communicator.rank + 2
 
 communicator.allreduce_grad(model)
 base = (communicator.size - 1.0) / 2
 
 chainer.testing.assert_allclose(model.a.W.grad,
 (base + 0) * np.ones((3, 2)))
 chainer.testing.assert_allclose(model.b.W.grad,
 (base + 1) * np.ones((4, 3)))
 chainer.testing.assert_allclose(model.c.b.grad,
> (base + 2) * np.ones((5, )))
tests/chainermn_tests/communicator_tests/test_communicator.py:316:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
x = array([3.5, 3.5, 3.5, 5. , 5. ], dtype=float32)
y = array([3.5, 3.5, 3.5, 3.5, 3.5]), atol = 1e-05, rtol = 0.0001
verbose = True
 def assert_allclose(x, y, atol=1e-5, rtol=1e-4, verbose=True):
 """Asserts if some corresponding element of x and y differs too much.
 
 This function can handle both CPU and GPU arrays simultaneously.
 
 Args:
 x: Left-hand-side array.
 y: Right-hand-side array.
 atol (float): Absolute tolerance.
 rtol (float): Relative tolerance.
 verbose (bool): If ``True``, it outputs verbose messages on error.
 
 """
 x = backend.CpuDevice().send(utils.force_array(x))
 y = backend.CpuDevice().send(utils.force_array(y))
 try:
 numpy.testing.assert_allclose(
 x, y, atol=atol, rtol=rtol, verbose=verbose)
 except AssertionError as e:
 f = six.StringIO()
 f.write(str(e) + '\n\n')
 f.write(
 'assert_allclose failed: \n' +
 ' shape: {} {}\n'.format(x.shape, y.shape) +
 ' dtype: {} {}\n'.format(x.dtype, y.dtype))
 if x.shape == y.shape:
 xx = numpy.atleast_1d(x)
 yy = numpy.atleast_1d(y)
 err = numpy.abs(xx - yy)
 tol_err = atol + rtol * numpy.abs(yy).astype(numpy.float64)
 i = numpy.unravel_index(
 numpy.argmax(err.astype(numpy.float64) - tol_err), err.shape)
 if yy[i] == 0:
 rel_err = 'inf'
 else:
 rel_err = err[i] / numpy.abs(yy[i])
 f.write(
 ' i: {}\n'.format(i) +
 ' x[i]: {}\n'.format(xx[i]) +
 ' y[i]: {}\n'.format(yy[i]) +
 ' relative error[i]: {}\n'.format(rel_err) +
 ' absolute error[i]: {}\n'.format(err[i]))
 opts = numpy.get_printoptions()
 try:
 numpy.set_printoptions(threshold=10000)
 f.write('x: ' + numpy.array2string(x, prefix='x: ') + '\n')
 f.write('y: ' + numpy.array2string(y, prefix='y: ') + '\n')
 finally:
 numpy.set_printoptions(**opts)
> raise AssertionError(f.getvalue())
E AssertionError: 
E Not equal to tolerance rtol=0.0001, atol=1e-05
E 
E Mismatch: 40%
E Max absolute difference: 1.5
E Max relative difference: 0.42857143
E x: array([3.5, 3.5, 3.5, 5. , 5. ], dtype=float32)
E y: array([3.5, 3.5, 3.5, 3.5, 3.5])
E 
E assert_allclose failed: 
E shape: (5,) (5,)
E dtype: float32 float64
E i: (3,)
E x[i]: 5.0
E y[i]: 3.5
E relative error[i]: 0.428571428571
E absolute error[i]: 1.5
E x: [3.5 3.5 3.5 5. 5. ]
E y: [3.5 3.5 3.5 3.5 3.5]
chainer/testing/array.py:59: AssertionError
========================== slowest 10 test durations ===========================
4.89s call tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_gpu[param0]
2.30s call tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_gpu[param1]
1.61s call tests/chainermn_tests/communicator_tests/test_communication_utility.py::TestCommunicationUtility::test_chunked_bcast_objs
0.99s call tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_gpu[param4]
0.23s call tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_cpu[param0]
0.22s call tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_gpu[param5]
0.20s call tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_gpu[param2]
0.20s call tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_gpu[param6]
0.10s call tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_gpu[param3]
(0.00 durations hidden. Use -vv to show these durations.)
============== 1 failed, 8 passed, 4 deselected in 13.16 seconds ===============
------------------------------------------------------------
Error occured on /process.sh [Line 303]: Status 1
PID: 15
Current directory: /chainer
Command line: /process.sh
------------------------------------------------------------
++ onerror 303
++ status=1
++ script=/process.sh
++ line=303
++ shift
++ args=
++ echo ''
++ echo ------------------------------------------------------------
++ echo 'Error occured on /process.sh [Line 303]: Status 1'
++ echo ''
++ echo 'PID: 15'
+++ id
++ echo 'Current directory: /chainer'
++ echo 'Command line: /process.sh '
++ echo ------------------------------------------------------------
++ echo ''
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment