- BLAS_APPLE : ✖ BLAS_APPLE
- BLAS_ATLAS : ✖ BLAS_ATLAS
- BLAS_MKL : ✖ BLAS_MKL
- BLAS_OPEN : ✔ BLAS_OPEN
- CAFFE : ✖ CAFFE
- CPU_AVX : ✔ CPU_AVX
- CPU_AVX2 : ✖ CPU_AVX2
- CPU_SSE : ✔ CPU_SSE
- CPU_SSE2 : ✔ CPU_SSE2
- CPU_SSE3 : ✔ CPU_SSE3
- CPU_SSE4A : ✖ CPU_SSE4A
- CPU_SSE4_1 : ✔ CPU_SSE4_1
- CPU_SSE4_2 : ✔ CPU_SSE4_2
- CUDA : ✔ CUDA
- CUDA_RTC : ✖ CUDA_RTC
- CUDNN : ✔ CUDNN
- CXX14 : ✖ CXX14
- DEBUG : ✖ DEBUG
- DIST_KVSTORE : ✖ DIST_KVSTORE
- F16C : ✔ F16C
- INT64_TENSOR_SIZE : ✖ INT64_TENSOR_SIZE
- JEMALLOC : ✖ JEMALLOC
- LAPACK : ✔ LAPACK
- MKLDNN : ✔ MKLDNN
- NCCL : ✖ NCCL
- OPENCV : ✖ OPENCV
- OPENMP : ✔ OPENMP
- PROFILER : ✖ PROFILER
- SIGNAL_HANDLER : ✖ SIGNAL_HANDLER
- SSE : ✖ SSE
- TENSORRT : ✖ TENSORRT
- TVM_OP : ✖ TVM_OP
Operator | Avg Forward Time (ms) | Avg. Backward Time (ms) | Max Mem Usage (Storage) (Bytes) | Inputs |
---|---|---|---|---|
batch_dot | 28.8235 | --- | 67108.8672 | {'lhs': (32, 1024, 1024), 'rhs': (32, 1024, 1024)} |
batch_dot | 2.749 | --- | 64000.0 | {'lhs': (32, 1000, 10), 'rhs': (32, 1000, 10), 'transpose_b': True} |
batch_dot | 0.4239 | --- | 6.4 | {'lhs': (32, 1000, 1), 'rhs': (32, 100, 1000), 'transpose_a': True, 'transpose_b': True} |
dot | 1.331 | 3.1084 | 2097.1521 | {'lhs': (1024, 1024), 'rhs': (1024, 1024)} |
dot | 0.2314 | 0.465 | 2000.0 | {'lhs': (1000, 10), 'rhs': (1000, 10), 'transpose_b': True} |
dot | 0.1888 | 0.094 | 0.2 | {'lhs': (1000, 1), 'rhs': (100, 1000), 'transpose_a': True, 'transpose_b': True} |