All benchmarks are composed of 10 batches of 2-dimensional matrices, with sizes varying from 128x128 to 4096x4096 with single-precision.
Matrix dimensions: 128x128 In-place C2C FFT time for 10 runs: 0.538662 ms
from dask.distributed import Client | |
from dask_cuda import LocalCUDACluster | |
from dask.array.utils import assert_eq | |
import dask.array as da | |
import cupy as cp | |
add_broadcast_kernel = cp.RawKernel( | |
r''' | |
extern "C" __global__ |
FROM python:3 | |
ENV PYTHON 3.7 | |
ENV NUMPY 1.16.2 | |
ENV UPSTREAM_DEV 1 | |
ENV TEST true | |
ENV LINT true | |
ENV COVERAGE false | |
ENV PARALLEL true | |
ENV XTRATESTARGS '' |
(lldb) | |
There is a running process, detach from it and attach?: [Y/n] | |
Process 6843 detached | |
_bt.cpython-37m-darwin.so was compiled with optimization - stepping may behave oddly; variables may not be available. | |
Process 6843 stopped | |
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP | |
frame #0: 0x0000000114b54ab0 _bt.cpython-37m-darwin.so`backtrace_thread [inlined] _wait_and_reset_signal at bt.c:182:35 [opt] | |
179 static | |
180 void _wait_and_reset_signal(struct sigaction *old_sa) { | |
181 // spin and wait. |