Created
April 12, 2019 09:37
-
-
Save keisukefukuda/4e5a89b039f4e4083769ba8303bbf3b2 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Fri Apr 12 07:58:03 UTC 2019 | |
================================================================================ | |
In process.sh: | |
CUDA_VERSION = 9.0 | |
PYTHON_VERSION = 2.7.15 | |
OPENMPI_VERSION = 2.1.3 | |
Chainer = https://github.com/chainer/chainer@master | |
CuPy = https://github.com/cupy/cupy@master | |
OMPI_COMM_WORLD_LOCAL_RANK = 1 | |
OMPI_COMM_WORLD_RANK = 3 | |
OMPI_COMM_WORLD_NODE_RANK = 1 | |
OMPI_COMM_WORLD_SIZE = 4 | |
OMPI_COMM_WORLD_LOCAL_SIZE = 2 | |
host = kokona-job000679-worker-1 | |
/usr/local/cuda-9.0/bin:/usr/local/pyenv/shims:/usr/local/cuda-9.2/bin:/bin:/usr/bin:/usr/local/pyenv/bin:/usr/local/openmpi-2.1.3/bin:/usr/local/openmpi-2.1.3/bin:/usr/local/pyenv/shims:/usr/local/cuda-9.2/bin:/bin:/usr/bin:/usr/local/pyenv/bin:/usr/local/pyenv/shims:/usr/local/cuda-9.2/bin:/bin:/usr/bin:/usr/local/pyenv/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin | |
/usr/local/cuda-9.2/bin/nvcc | |
nvcc --version | |
nvcc: NVIDIA (R) Cuda compiler driver | |
Copyright (c) 2005-2018 NVIDIA Corporation | |
Built on Tue_Jun_12_23:07:04_CDT_2018 | |
Cuda compilation tools, release 9.2, V9.2.148 | |
nvidia-smi | |
Fri Apr 12 07:58:04 2019 | |
+-----------------------------------------------------------------------------+ | |
| NVIDIA-SMI 410.79 Driver Version: 410.79 CUDA Version: 10.0 | | |
|-------------------------------+----------------------+----------------------+ | |
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | |
|===============================+======================+======================| | |
| 0 Tesla V100-PCIE... On | 00000000:3F:00.0 Off | 0 | | |
| N/A 33C P0 26W / 250W | 11MiB / 32480MiB | 0% Default | | |
+-------------------------------+----------------------+----------------------+ | |
| 1 Tesla V100-PCIE... On | 00000000:40:00.0 Off | 0 | | |
| N/A 34C P0 27W / 250W | 11MiB / 32480MiB | 0% Default | | |
+-------------------------------+----------------------+----------------------+ | |
+-----------------------------------------------------------------------------+ | |
| Processes: GPU Memory | | |
| GPU PID Type Process name Usage | | |
|=============================================================================| | |
| No running processes found | | |
+-----------------------------------------------------------------------------+ | |
Fri Apr 12 07:58:04 UTC 2019 | |
================================================================================ | |
Setup Python, MPI, CuDNN, etc. | |
================================================================================ | |
Python 2.7.15 | |
Fri Apr 12 07:58:04 UTC 2019 | |
================================================================================ | |
Install Chainer / CuPy | |
================================================================================ | |
+ export CPATH=/cudnnenv/.cudnn/active/cuda/include:/cudnnenv/active/cuda/include: | |
+ CPATH=/cudnnenv/.cudnn/active/cuda/include:/cudnnenv/active/cuda/include: | |
+ export LD_LIBRARY_PATH=/cudnnenv/.cudnn/active/cuda/lib64:/cudnnenv/active/cuda/lib64:/usr/local/openmpi-2.1.3/lib:/cudnnenv/active/cuda/lib64:/cudnnenv/active/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 | |
+ LD_LIBRARY_PATH=/cudnnenv/.cudnn/active/cuda/lib64:/cudnnenv/active/cuda/lib64:/usr/local/openmpi-2.1.3/lib:/cudnnenv/active/cuda/lib64:/cudnnenv/active/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 | |
+ export LIBRARY_PATH=/cudnnenv/.cudnn/active/cuda/lib64:/cudnnenv/active/cuda/lib64:/usr/local/cuda/lib64/stubs | |
+ LIBRARY_PATH=/cudnnenv/.cudnn/active/cuda/lib64:/cudnnenv/active/cuda/lib64:/usr/local/cuda/lib64/stubs | |
Waiting for Chainer repository to be set up... | |
+ export CPATH=/usr/local/nccl/2.4.2-1/include:/cudnnenv/.cudnn/active/cuda/include:/cudnnenv/active/cuda/include: | |
+ CPATH=/usr/local/nccl/2.4.2-1/include:/cudnnenv/.cudnn/active/cuda/include:/cudnnenv/active/cuda/include: | |
+ export LD_LIBRARY_PATH=/usr/local/nccl/2.4.2-1/lib:/cudnnenv/.cudnn/active/cuda/lib64:/cudnnenv/active/cuda/lib64:/usr/local/openmpi-2.1.3/lib:/cudnnenv/active/cuda/lib64:/cudnnenv/active/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 | |
+ LD_LIBRARY_PATH=/usr/local/nccl/2.4.2-1/lib:/cudnnenv/.cudnn/active/cuda/lib64:/cudnnenv/active/cuda/lib64:/usr/local/openmpi-2.1.3/lib:/cudnnenv/active/cuda/lib64:/cudnnenv/active/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 | |
+ export LIBRARY_PATH=/usr/local/nccl/2.4.2-1/lib:/cudnnenv/.cudnn/active/cuda/lib64:/cudnnenv/active/cuda/lib64:/usr/local/cuda/lib64/stubs | |
+ LIBRARY_PATH=/usr/local/nccl/2.4.2-1/lib:/cudnnenv/.cudnn/active/cuda/lib64:/cudnnenv/active/cuda/lib64:/usr/local/cuda/lib64/stubs | |
+ '[' 1 -eq 0 ']' | |
+ echo 'Waiting for Chainer repository to be set up...' | |
+ sleep 30 | |
+ true | |
+ '[' -f /tmp/chainer_install_done ']' | |
Still waiting for /tmp/chainer_install_done to be created | |
+ echo 'Still waiting for /tmp/chainer_install_done to be created' | |
+ sleep 5 | |
+ true | |
+ '[' -f /tmp/chainer_install_done ']' | |
Still waiting for /tmp/chainer_install_done to be created | |
+ echo 'Still waiting for /tmp/chainer_install_done to be created' | |
+ sleep 5 | |
+ true | |
+ '[' -f /tmp/chainer_install_done ']' | |
Still waiting for /tmp/chainer_install_done to be created | |
+ echo 'Still waiting for /tmp/chainer_install_done to be created' | |
+ sleep 5 | |
+ true | |
+ '[' -f /tmp/chainer_install_done ']' | |
Found /tmp/chainer_install_done file | |
+ echo 'Found /tmp/chainer_install_done file' | |
+ break | |
+ set -e | |
+ date | |
Fri Apr 12 07:58:49 UTC 2019 | |
+ cat | |
================================================================================ | |
Chainer software versions: | |
================================================================================ | |
+ echo 'Chainer versions:' | |
Chainer versions: | |
+ python -c 'import chainer; print('\''Chainer '\'', chainer.__version__)' | |
('Chainer ', '6.0.0rc1') | |
+ python -c 'import cupy; print('\''Cupy '\'', cupy.__version__)' | |
('Cupy ', '6.0.0rc1') | |
+ python -c 'import chainer; chainer.print_runtime_info()' | |
Platform: Linux-4.4.0-116-generic-x86_64-with-debian-stretch-sid | |
Chainer: 6.0.0rc1 | |
NumPy: 1.16.2 | |
CuPy: | |
CuPy Version : 6.0.0rc1 | |
CUDA Root : /usr/local/cuda-9.2 | |
CUDA Build Version : 9020 | |
CUDA Driver Version : 10000 | |
CUDA Runtime Version : 9020 | |
cuDNN Build Version : 7500 | |
cuDNN Version : 7500 | |
NCCL Build Version : 2402 | |
NCCL Runtime Version : 2402 | |
iDeep: Not Available | |
+ date | |
Fri Apr 12 07:58:54 UTC 2019 | |
+ echo ===================================================================== | |
===================================================================== | |
+ echo ' ibstat' | |
ibstat | |
+ echo ===================================================================== | |
===================================================================== | |
+ ibstat | |
CA 'mlx5_0' | |
CA type: MT4115 | |
Number of ports: 1 | |
Firmware version: 12.23.1000 | |
Hardware version: 0 | |
Node GUID: 0x506b4b03001c9e1a | |
System image GUID: 0x506b4b03001c9e1a | |
Port 1: | |
State: Active | |
Physical state: LinkUp | |
Rate: 100 | |
Base lid: 351 | |
LMC: 0 | |
SM lid: 1 | |
Capability mask: 0x2651e848 | |
Port GUID: 0x506b4b03001c9e1a | |
Link layer: InfiniBand | |
CA 'mlx5_1' | |
CA type: MT4115 | |
Number of ports: 1 | |
Firmware version: 12.23.1000 | |
Hardware version: 0 | |
Node GUID: 0x506b4b03001c9dce | |
System image GUID: 0x506b4b03001c9dce | |
Port 1: | |
State: Active | |
Physical state: LinkUp | |
Rate: 100 | |
Base lid: 334 | |
LMC: 0 | |
SM lid: 1 | |
Capability mask: 0x2651e848 | |
Port GUID: 0x506b4b03001c9dce | |
Link layer: InfiniBand | |
+ date | |
Fri Apr 12 07:58:54 UTC 2019 | |
+ echo ===================================================================== | |
+ echo ' nvidia-smi' | |
===================================================================== | |
nvidia-smi | |
+ echo ===================================================================== | |
===================================================================== | |
+ nvidia-smi | |
Fri Apr 12 07:58:54 2019 | |
+-----------------------------------------------------------------------------+ | |
| NVIDIA-SMI 410.79 Driver Version: 410.79 CUDA Version: 10.0 | | |
|-------------------------------+----------------------+----------------------+ | |
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | |
|===============================+======================+======================| | |
| 0 Tesla V100-PCIE... On | 00000000:3F:00.0 Off | 0 | | |
| N/A 33C P0 26W / 250W | 11MiB / 32480MiB | 0% Default | | |
+-------------------------------+----------------------+----------------------+ | |
| 1 Tesla V100-PCIE... On | 00000000:40:00.0 Off | 0 | | |
| N/A 34C P0 27W / 250W | 11MiB / 32480MiB | 0% Default | | |
+-------------------------------+----------------------+----------------------+ | |
+-----------------------------------------------------------------------------+ | |
| Processes: GPU Memory | | |
| GPU PID Type Process name Usage | | |
|=============================================================================| | |
| No running processes found | | |
+-----------------------------------------------------------------------------+ | |
+ nvidia-smi -x -q | |
+ grep uuid | |
<uuid>GPU-f2688668-b481-6666-6305-fc9f28487066</uuid> | |
<uuid>GPU-bfc28117-c095-cf68-cf17-14c444e92540</uuid> | |
+ date | |
Fri Apr 12 07:58:57 UTC 2019 | |
+ echo ===================================================================== | |
===================================================================== | |
+ echo ' chainermn-micro-benchmark' | |
chainermn-micro-benchmark | |
+ echo ===================================================================== | |
===================================================================== | |
+ date | |
Fri Apr 12 07:58:57 UTC 2019 | |
+ cat | |
================================================================================ | |
Main task | |
================================================================================ | |
+ cd /chainer | |
+ case $KOKONA_TARGET in | |
+ TIMEOUT=7200 | |
+ timeout -s KILL -k 30 7200 python -m pytest --color=yes --full-trace --duration=10 -x --capture=no -s -v -m 'not slow' tests/chainermn_tests | |
[1m============================= test session starts ==============================[0m | |
platform linux2 -- Python 2.7.15, pytest-4.4.0, py-1.8.0, pluggy-0.9.0 -- /usr/local/pyenv/versions/2.7.15/bin/python | |
cachedir: .pytest_cache | |
rootdir: /chainer, inifile: setup.cfg | |
[1mcollecting ... [0mmbind: Operation not permitted | |
mbind: Operation not permitted | |
mbind: Operation not permitted | |
mbind: Operation not permitted | |
mbind: Operation not permitted | |
mbind: Operation not permitted | |
mbind: Operation not permitted | |
mbind: Operation not permitted | |
[1m | |
collecting 0 items [0m[1m | |
collecting 174 items [0m[1m | |
collected 266 items / 4 deselected / 262 selected [0m | |
tests/chainermn_tests/communicator_tests/test_communication_utility.py::TestCommunicationUtility::test_chunked_bcast_objs [32mPASSED[0m | |
tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_cpu[param0] [32mPASSED[0m | |
tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_gpu[param0] [32mPASSED[0m | |
tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_gpu[param1] [32mPASSED[0m | |
tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_gpu[param2] [32mPASSED[0m | |
tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_gpu[param3] [32mPASSED[0m | |
tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_gpu[param4] [32mPASSED[0m | |
tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_gpu[param5] [32mPASSED[0m | |
tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_gpu[param6] [31mFAILED[0m | |
=================================== FAILURES =================================== | |
[1m[31m________________________ test_communicator_gpu[param6] _________________________[0m | |
cls = <class '_pytest.runner.CallInfo'> | |
func = <function <lambda> at 0x7efd48829cf8>, when = 'call' | |
reraise = (<class '_pytest.outcomes.Exit'>, <type 'exceptions.KeyboardInterrupt'>) | |
[1m @classmethod[0m | |
[1m def from_call(cls, func, when, reraise=None):[0m | |
[1m #: context of invocation: one of "setup", "call",[0m | |
[1m #: "teardown", "memocollect"[0m | |
[1m start = time()[0m | |
[1m excinfo = None[0m | |
[1m try:[0m | |
[1m> result = func()[0m | |
[1m[31m/usr/local/pyenv/versions/2.7.15/lib/python2.7/site-packages/_pytest/runner.py[0m:226: | |
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ | |
[1m> lambda: ihook(item=item, **kwds), when=when, reraise=reraise[0m | |
[1m )[0m | |
[1m[31m/usr/local/pyenv/versions/2.7.15/lib/python2.7/site-packages/_pytest/runner.py[0m:198: | |
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ | |
self = <_HookCaller 'pytest_runtest_call'>, args = () | |
kwargs = {'item': <Function test_communicator_gpu[param6]>}, notincall = set([]) | |
[1m def __call__(self, *args, **kwargs):[0m | |
[1m if args:[0m | |
[1m raise TypeError("hook calling supports only keyword arguments")[0m | |
[1m assert not self.is_historic()[0m | |
[1m if self.spec and self.spec.argnames:[0m | |
[1m notincall = ([0m | |
[1m set(self.spec.argnames) - set(["__multicall__"]) - set(kwargs.keys())[0m | |
[1m )[0m | |
[1m if notincall:[0m | |
[1m warnings.warn([0m | |
[1m "Argument(s) {} which are declared in the hookspec "[0m | |
[1m "can not be found in this hook call".format(tuple(notincall)),[0m | |
[1m stacklevel=2,[0m | |
[1m )[0m | |
[1m> return self._hookexec(self, self.get_hookimpls(), kwargs)[0m | |
[1m[31m/usr/local/pyenv/versions/2.7.15/lib/python2.7/site-packages/pluggy/hooks.py[0m:289: | |
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ | |
self = <_pytest.config.PytestPluginManager object at 0x7efd5d434850> | |
hook = <_HookCaller 'pytest_runtest_call'> | |
methods = [<HookImpl plugin_name='runner', plugin=<module '_pytest.runner' from '/usr/local/pyenv/versions/2.7.15/lib/python2.7/...u[param6]>>>, <HookImpl plugin_name='logging-plugin', plugin=<_pytest.logging.LoggingPlugin object at 0x7efd5d73aa90>>] | |
kwargs = {'item': <Function test_communicator_gpu[param6]>} | |
[1m def _hookexec(self, hook, methods, kwargs):[0m | |
[1m # called from all hookcaller instances.[0m | |
[1m # enable_tracing will set its own wrapping function at self._inner_hookexec[0m | |
[1m> return self._inner_hookexec(hook, methods, kwargs)[0m | |
[1m[31m/usr/local/pyenv/versions/2.7.15/lib/python2.7/site-packages/pluggy/manager.py[0m:68: | |
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ | |
hook = <_HookCaller 'pytest_runtest_call'> | |
methods = [<HookImpl plugin_name='runner', plugin=<module '_pytest.runner' from '/usr/local/pyenv/versions/2.7.15/lib/python2.7/...u[param6]>>>, <HookImpl plugin_name='logging-plugin', plugin=<_pytest.logging.LoggingPlugin object at 0x7efd5d73aa90>>] | |
kwargs = {'item': <Function test_communicator_gpu[param6]>} | |
[1m self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall([0m | |
[1m methods,[0m | |
[1m kwargs,[0m | |
[1m> firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,[0m | |
[1m )[0m | |
[1m[31m/usr/local/pyenv/versions/2.7.15/lib/python2.7/site-packages/pluggy/manager.py[0m:62: | |
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ | |
item = <Function test_communicator_gpu[param6]> | |
[1m def pytest_runtest_call(item):[0m | |
[1m _update_current_test_var(item, "call")[0m | |
[1m sys.last_type, sys.last_value, sys.last_traceback = (None, None, None)[0m | |
[1m try:[0m | |
[1m> item.runtest()[0m | |
[1m[31m/usr/local/pyenv/versions/2.7.15/lib/python2.7/site-packages/_pytest/runner.py[0m:123: | |
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ | |
self = <Function test_communicator_gpu[param6]> | |
[1m def runtest(self):[0m | |
[1m """ execute the underlying test function. """[0m | |
[1m> self.ihook.pytest_pyfunc_call(pyfuncitem=self)[0m | |
[1m[31m/usr/local/pyenv/versions/2.7.15/lib/python2.7/site-packages/_pytest/python.py[0m:1464: | |
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ | |
self = <_HookCaller 'pytest_pyfunc_call'>, args = () | |
kwargs = {'pyfuncitem': <Function test_communicator_gpu[param6]>} | |
notincall = set([]) | |
[1m def __call__(self, *args, **kwargs):[0m | |
[1m if args:[0m | |
[1m raise TypeError("hook calling supports only keyword arguments")[0m | |
[1m assert not self.is_historic()[0m | |
[1m if self.spec and self.spec.argnames:[0m | |
[1m notincall = ([0m | |
[1m set(self.spec.argnames) - set(["__multicall__"]) - set(kwargs.keys())[0m | |
[1m )[0m | |
[1m if notincall:[0m | |
[1m warnings.warn([0m | |
[1m "Argument(s) {} which are declared in the hookspec "[0m | |
[1m "can not be found in this hook call".format(tuple(notincall)),[0m | |
[1m stacklevel=2,[0m | |
[1m )[0m | |
[1m> return self._hookexec(self, self.get_hookimpls(), kwargs)[0m | |
[1m[31m/usr/local/pyenv/versions/2.7.15/lib/python2.7/site-packages/pluggy/hooks.py[0m:289: | |
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ | |
self = <_pytest.config.PytestPluginManager object at 0x7efd5d434850> | |
hook = <_HookCaller 'pytest_pyfunc_call'> | |
methods = [<HookImpl plugin_name='python', plugin=<module '_pytest.python' from '/usr/local/pyenv/versions/2.7.15/lib/python2.7/...=<module '_pytest.skipping' from '/usr/local/pyenv/versions/2.7.15/lib/python2.7/site-packages/_pytest/skipping.pyc'>>] | |
kwargs = {'pyfuncitem': <Function test_communicator_gpu[param6]>} | |
[1m def _hookexec(self, hook, methods, kwargs):[0m | |
[1m # called from all hookcaller instances.[0m | |
[1m # enable_tracing will set its own wrapping function at self._inner_hookexec[0m | |
[1m> return self._inner_hookexec(hook, methods, kwargs)[0m | |
[1m[31m/usr/local/pyenv/versions/2.7.15/lib/python2.7/site-packages/pluggy/manager.py[0m:68: | |
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ | |
hook = <_HookCaller 'pytest_pyfunc_call'> | |
methods = [<HookImpl plugin_name='python', plugin=<module '_pytest.python' from '/usr/local/pyenv/versions/2.7.15/lib/python2.7/...=<module '_pytest.skipping' from '/usr/local/pyenv/versions/2.7.15/lib/python2.7/site-packages/_pytest/skipping.pyc'>>] | |
kwargs = {'pyfuncitem': <Function test_communicator_gpu[param6]>} | |
[1m self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall([0m | |
[1m methods,[0m | |
[1m kwargs,[0m | |
[1m> firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,[0m | |
[1m )[0m | |
[1m[31m/usr/local/pyenv/versions/2.7.15/lib/python2.7/site-packages/pluggy/manager.py[0m:62: | |
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ | |
pyfuncitem = <Function test_communicator_gpu[param6]> | |
[1m @hookimpl(trylast=True)[0m | |
[1m def pytest_pyfunc_call(pyfuncitem):[0m | |
[1m testfunction = pyfuncitem.obj[0m | |
[1m iscoroutinefunction = getattr(inspect, "iscoroutinefunction", None)[0m | |
[1m if iscoroutinefunction is not None and iscoroutinefunction(testfunction):[0m | |
[1m msg = "Coroutine functions are not natively supported and have been skipped.\n"[0m | |
[1m msg += "You need to install a suitable plugin for your async framework, for example:\n"[0m | |
[1m msg += " - pytest-asyncio\n"[0m | |
[1m msg += " - pytest-trio\n"[0m | |
[1m msg += " - pytest-tornasync"[0m | |
[1m warnings.warn(PytestWarning(msg.format(pyfuncitem.nodeid)))[0m | |
[1m skip(msg="coroutine function and no async plugin installed (see warnings)")[0m | |
[1m funcargs = pyfuncitem.funcargs[0m | |
[1m testargs = {arg: funcargs[arg] for arg in pyfuncitem._fixtureinfo.argnames}[0m | |
[1m> testfunction(**testargs)[0m | |
[1m[31m/usr/local/pyenv/versions/2.7.15/lib/python2.7/site-packages/_pytest/python.py[0m:178: | |
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ | |
param = {'allreduce_grad_dtype': None, | |
'batched_copy': False, | |
'communicator_class': ...one, | |
'gpu': False, | |
'model_dtype': None, | |
'multi_node': True, | |
'nccl1': False} | |
[1m @pytest.mark.parametrize('param', gpu_params)[0m | |
[1m @chainer.testing.attr.gpu[0m | |
[1m def test_communicator_gpu(param):[0m | |
[1m check_send_recv(param, True)[0m | |
[1m> check_collective_communication(param, True)[0m | |
[1m[31mtests/chainermn_tests/communicator_tests/test_communicator.py[0m:476: | |
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ | |
param = {'allreduce_grad_dtype': None, | |
'batched_copy': False, | |
'communicator_class': ...one, | |
'gpu': False, | |
'model_dtype': None, | |
'multi_node': True, | |
'nccl1': False} | |
use_gpu = True | |
[1m def check_collective_communication(param, use_gpu):[0m | |
[1m communicator = create_communicator(param, use_gpu)[0m | |
[1m mpi_comm.barrier()[0m | |
[1m [0m | |
[1m model = ExampleModel(param.model_dtype)[0m | |
[1m if use_gpu:[0m | |
[1m model.to_gpu()[0m | |
[1m check_bcast_data(communicator, model)[0m | |
[1m> check_allreduce_grad(communicator, model)[0m | |
[1m[31mtests/chainermn_tests/communicator_tests/test_communicator.py[0m:444: | |
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ | |
communicator = <chainermn.communicators.two_dimensional_communicator.TwoDimensionalCommunicator object at 0x7efd4882dc50> | |
model = <test_communicator.ExampleModel object at 0x7efd4882dbd0> | |
[1m def check_allreduce_grad(communicator, model):[0m | |
[1m # We need to repeat twice for regressions on lazy initialization of[0m | |
[1m # sub communicators.[0m | |
[1m [0m | |
[1m for _ in range(2):[0m | |
[1m model.a.W.grad[:] = communicator.rank[0m | |
[1m model.b.W.grad[:] = communicator.rank + 1[0m | |
[1m model.c.b.grad[:] = communicator.rank + 2[0m | |
[1m [0m | |
[1m communicator.allreduce_grad(model)[0m | |
[1m base = (communicator.size - 1.0) / 2[0m | |
[1m [0m | |
[1m chainer.testing.assert_allclose(model.a.W.grad,[0m | |
[1m (base + 0) * np.ones((3, 2)))[0m | |
[1m chainer.testing.assert_allclose(model.b.W.grad,[0m | |
[1m (base + 1) * np.ones((4, 3)))[0m | |
[1m chainer.testing.assert_allclose(model.c.b.grad,[0m | |
[1m> (base + 2) * np.ones((5, )))[0m | |
[1m[31mtests/chainermn_tests/communicator_tests/test_communicator.py[0m:316: | |
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ | |
x = array([3.5, 3.5, 3.5, 5. , 5. ], dtype=float32) | |
y = array([3.5, 3.5, 3.5, 3.5, 3.5]), atol = 1e-05, rtol = 0.0001 | |
verbose = True | |
[1m def assert_allclose(x, y, atol=1e-5, rtol=1e-4, verbose=True):[0m | |
[1m """Asserts if some corresponding element of x and y differs too much.[0m | |
[1m [0m | |
[1m This function can handle both CPU and GPU arrays simultaneously.[0m | |
[1m [0m | |
[1m Args:[0m | |
[1m x: Left-hand-side array.[0m | |
[1m y: Right-hand-side array.[0m | |
[1m atol (float): Absolute tolerance.[0m | |
[1m rtol (float): Relative tolerance.[0m | |
[1m verbose (bool): If ``True``, it outputs verbose messages on error.[0m | |
[1m [0m | |
[1m """[0m | |
[1m x = backend.CpuDevice().send(utils.force_array(x))[0m | |
[1m y = backend.CpuDevice().send(utils.force_array(y))[0m | |
[1m try:[0m | |
[1m numpy.testing.assert_allclose([0m | |
[1m x, y, atol=atol, rtol=rtol, verbose=verbose)[0m | |
[1m except AssertionError as e:[0m | |
[1m f = six.StringIO()[0m | |
[1m f.write(str(e) + '\n\n')[0m | |
[1m f.write([0m | |
[1m 'assert_allclose failed: \n' +[0m | |
[1m ' shape: {} {}\n'.format(x.shape, y.shape) +[0m | |
[1m ' dtype: {} {}\n'.format(x.dtype, y.dtype))[0m | |
[1m if x.shape == y.shape:[0m | |
[1m xx = numpy.atleast_1d(x)[0m | |
[1m yy = numpy.atleast_1d(y)[0m | |
[1m err = numpy.abs(xx - yy)[0m | |
[1m tol_err = atol + rtol * numpy.abs(yy).astype(numpy.float64)[0m | |
[1m i = numpy.unravel_index([0m | |
[1m numpy.argmax(err.astype(numpy.float64) - tol_err), err.shape)[0m | |
[1m if yy[i] == 0:[0m | |
[1m rel_err = 'inf'[0m | |
[1m else:[0m | |
[1m rel_err = err[i] / numpy.abs(yy[i])[0m | |
[1m f.write([0m | |
[1m ' i: {}\n'.format(i) +[0m | |
[1m ' x[i]: {}\n'.format(xx[i]) +[0m | |
[1m ' y[i]: {}\n'.format(yy[i]) +[0m | |
[1m ' relative error[i]: {}\n'.format(rel_err) +[0m | |
[1m ' absolute error[i]: {}\n'.format(err[i]))[0m | |
[1m opts = numpy.get_printoptions()[0m | |
[1m try:[0m | |
[1m numpy.set_printoptions(threshold=10000)[0m | |
[1m f.write('x: ' + numpy.array2string(x, prefix='x: ') + '\n')[0m | |
[1m f.write('y: ' + numpy.array2string(y, prefix='y: ') + '\n')[0m | |
[1m finally:[0m | |
[1m numpy.set_printoptions(**opts)[0m | |
[1m> raise AssertionError(f.getvalue())[0m | |
[1m[31mE AssertionError: [0m | |
[1m[31mE Not equal to tolerance rtol=0.0001, atol=1e-05[0m | |
[1m[31mE [0m | |
[1m[31mE Mismatch: 40%[0m | |
[1m[31mE Max absolute difference: 1.5[0m | |
[1m[31mE Max relative difference: 0.42857143[0m | |
[1m[31mE x: array([3.5, 3.5, 3.5, 5. , 5. ], dtype=float32)[0m | |
[1m[31mE y: array([3.5, 3.5, 3.5, 3.5, 3.5])[0m | |
[1m[31mE [0m | |
[1m[31mE assert_allclose failed: [0m | |
[1m[31mE shape: (5,) (5,)[0m | |
[1m[31mE dtype: float32 float64[0m | |
[1m[31mE i: (3,)[0m | |
[1m[31mE x[i]: 5.0[0m | |
[1m[31mE y[i]: 3.5[0m | |
[1m[31mE relative error[i]: 0.428571428571[0m | |
[1m[31mE absolute error[i]: 1.5[0m | |
[1m[31mE x: [3.5 3.5 3.5 5. 5. ][0m | |
[1m[31mE y: [3.5 3.5 3.5 3.5 3.5][0m | |
[1m[31mchainer/testing/array.py[0m:59: AssertionError | |
========================== slowest 10 test durations =========================== | |
4.89s call tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_gpu[param0] | |
2.30s call tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_gpu[param1] | |
1.61s call tests/chainermn_tests/communicator_tests/test_communication_utility.py::TestCommunicationUtility::test_chunked_bcast_objs | |
0.99s call tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_gpu[param4] | |
0.23s call tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_cpu[param0] | |
0.22s call tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_gpu[param5] | |
0.20s call tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_gpu[param2] | |
0.20s call tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_gpu[param6] | |
0.10s call tests/chainermn_tests/communicator_tests/test_communicator.py::test_communicator_gpu[param3] | |
(0.00 durations hidden. Use -vv to show these durations.) | |
[1m[31m============== 1 failed, 8 passed, 4 deselected in 13.16 seconds ===============[0m | |
------------------------------------------------------------ | |
Error occured on /process.sh [Line 303]: Status 1 | |
PID: 15 | |
Current directory: /chainer | |
Command line: /process.sh | |
------------------------------------------------------------ | |
++ onerror 303 | |
++ status=1 | |
++ script=/process.sh | |
++ line=303 | |
++ shift | |
++ args= | |
++ echo '' | |
++ echo ------------------------------------------------------------ | |
++ echo 'Error occured on /process.sh [Line 303]: Status 1' | |
++ echo '' | |
++ echo 'PID: 15' | |
+++ id | |
++ echo 'Current directory: /chainer' | |
++ echo 'Command line: /process.sh ' | |
++ echo ------------------------------------------------------------ | |
++ echo '' |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment