Skip to content

Instantly share code, notes, and snippets.

@ToucheSir
Last active November 7, 2021 17:42
Show Gist options
  • Save ToucheSir/2fecbe99e5b304fed11e25e42c535cc3 to your computer and use it in GitHub Desktop.
Save ToucheSir/2fecbe99e5b304fed11e25e42c535cc3 to your computer and use it in GitHub Desktop.
Flux RNN View vs Slice
LSTM CPU c=32 n=32 ts=4
forward
40.249 μs (53 allocations: 88.14 KiB)
backward
155.457 μs (608 allocations: 233.70 KiB)
forw and back
336.703 μs (1495 allocations: 405.92 KiB)
LSTM CUDA c=32 n=32 ts=4
forward
449.046 μs (1289 allocations: 70.70 KiB)
backward
1.079 ms (3075 allocations: 208.31 KiB)
forw and back
1.702 ms (5005 allocations: 321.98 KiB)
LSTM CPU c=32 n=32 ts=16
forward
177.645 μs (209 allocations: 364.23 KiB)
backward
584.899 μs (2300 allocations: 950.73 KiB)
forw and back
1.219 ms (5575 allocations: 1.59 MiB)
LSTM CUDA c=32 n=32 ts=16
forward
1.384 ms (5153 allocations: 282.67 KiB)
backward
3.566 ms (12219 allocations: 831.17 KiB)
forw and back
5.574 ms (19597 allocations: 1.23 MiB)
LSTM CPU c=32 n=32 ts=32
forward
360.835 μs (417 allocations: 732.36 KiB)
backward
1.154 ms (4557 allocations: 1.86 MiB)
forw and back
2.370 ms (11017 allocations: 3.19 MiB)
LSTM CUDA c=32 n=32 ts=32
forward
2.631 ms (10305 allocations: 565.30 KiB)
backward
6.852 ms (24412 allocations: 1.62 MiB)
forw and back
10.746 ms (39055 allocations: 2.46 MiB)
LSTM CPU c=32 n=32 ts=64
forward
733.224 μs (833 allocations: 1.43 MiB)
backward
2.329 ms (9069 allocations: 3.73 MiB)
forw and back
4.713 ms (21897 allocations: 6.38 MiB)
LSTM CUDA c=32 n=32 ts=64
forward
5.068 ms (20609 allocations: 1.10 MiB)
backward
13.402 ms (48796 allocations: 3.24 MiB)
forw and back
21.120 ms (77967 allocations: 4.91 MiB)
LSTM CPU c=32 n=128 ts=4
forward
153.963 μs (53 allocations: 342.64 KiB)
backward
273.555 μs (608 allocations: 722.83 KiB)
forw and back
597.819 μs (1495 allocations: 1.16 MiB)
LSTM CUDA c=32 n=128 ts=4
forward
467.039 μs (1297 allocations: 70.83 KiB)
backward
1.109 ms (3119 allocations: 209.00 KiB)
forw and back
1.761 ms (5049 allocations: 322.67 KiB)
LSTM CPU c=32 n=128 ts=16
forward
644.988 μs (209 allocations: 1.38 MiB)
backward
1.065 ms (2300 allocations: 2.86 MiB)
forw and back
2.241 ms (5575 allocations: 4.70 MiB)
LSTM CUDA c=32 n=128 ts=16
forward
1.426 ms (5185 allocations: 283.17 KiB)
backward
3.689 ms (12395 allocations: 833.92 KiB)
forw and back
5.764 ms (19773 allocations: 1.24 MiB)
LSTM CPU c=32 n=128 ts=32
forward
1.302 ms (417 allocations: 2.79 MiB)
backward
2.168 ms (4557 allocations: 5.73 MiB)
forw and back
4.473 ms (11017 allocations: 9.41 MiB)
LSTM CUDA c=32 n=128 ts=32
forward
2.663 ms (10369 allocations: 566.30 KiB)
backward
7.051 ms (24764 allocations: 1.63 MiB)
forw and back
10.977 ms (39407 allocations: 2.46 MiB)
LSTM CPU c=32 n=128 ts=64
forward
2.627 ms (833 allocations: 5.59 MiB)
backward
4.497 ms (9069 allocations: 11.46 MiB)
forw and back
8.940 ms (21897 allocations: 18.84 MiB)
LSTM CUDA c=32 n=128 ts=64
forward
5.145 ms (20737 allocations: 1.11 MiB)
backward
13.790 ms (49500 allocations: 3.26 MiB)
forw and back
21.547 ms (78671 allocations: 4.92 MiB)
LSTM CPU c=32 n=512 ts=4
forward
1.477 ms (64 allocations: 1.32 MiB)
backward
1.701 ms (636 allocations: 2.60 MiB)
forw and back
3.646 ms (1546 allocations: 4.18 MiB)
LSTM CUDA c=32 n=512 ts=4
forward
468.979 μs (1334 allocations: 71.66 KiB)
backward
1.175 ms (3147 allocations: 209.44 KiB)
forw and back
1.834 ms (5146 allocations: 325.08 KiB)
LSTM CPU c=32 n=512 ts=16
forward
5.619 ms (256 allocations: 5.46 MiB)
backward
7.090 ms (2412 allocations: 10.51 MiB)
forw and back
14.776 ms (5782 allocations: 16.99 MiB)
LSTM CUDA c=32 n=512 ts=16
forward
1.435 ms (5342 allocations: 286.62 KiB)
backward
3.871 ms (12519 allocations: 835.86 KiB)
forw and back
6.144 ms (20134 allocations: 1.24 MiB)
LSTM CPU c=32 n=512 ts=32
forward
12.595 ms (512 allocations: 10.98 MiB)
backward
14.100 ms (4781 allocations: 21.06 MiB)
forw and back
29.444 ms (11432 allocations: 34.07 MiB)
LSTM CUDA c=32 n=512 ts=32
forward
2.721 ms (10686 allocations: 573.25 KiB)
backward
7.438 ms (25016 allocations: 1.63 MiB)
forw and back
11.761 ms (40120 allocations: 2.48 MiB)
LSTM CPU c=32 n=512 ts=64
forward
25.441 ms (1024 allocations: 22.03 MiB)
backward
28.548 ms (9517 allocations: 42.15 MiB)
forw and back
58.672 ms (22728 allocations: 68.23 MiB)
LSTM CUDA c=32 n=512 ts=64
forward
5.208 ms (21374 allocations: 1.12 MiB)
backward
14.488 ms (50008 allocations: 3.26 MiB)
forw and back
23.170 ms (80088 allocations: 4.94 MiB)
LSTM CPU c=128 n=32 ts=4
forward
53.703 μs (53 allocations: 88.14 KiB)
backward
196.536 μs (608 allocations: 413.70 KiB)
forw and back
399.235 μs (1495 allocations: 537.92 KiB)
LSTM CUDA c=128 n=32 ts=4
forward
484.813 μs (1289 allocations: 70.70 KiB)
backward
1.125 ms (3075 allocations: 208.31 KiB)
forw and back
1.769 ms (5005 allocations: 321.98 KiB)
LSTM CPU c=128 n=32 ts=16
forward
232.433 μs (209 allocations: 364.23 KiB)
backward
741.138 μs (2300 allocations: 1.67 MiB)
forw and back
1.420 ms (5575 allocations: 2.14 MiB)
LSTM CUDA c=128 n=32 ts=16
forward
1.469 ms (5153 allocations: 282.67 KiB)
backward
3.697 ms (12219 allocations: 831.17 KiB)
forw and back
5.811 ms (19597 allocations: 1.23 MiB)
LSTM CPU c=128 n=32 ts=32
forward
475.835 μs (417 allocations: 732.36 KiB)
backward
1.496 ms (4557 allocations: 3.35 MiB)
forw and back
2.802 ms (11017 allocations: 4.30 MiB)
LSTM CUDA c=128 n=32 ts=32
forward
2.735 ms (10305 allocations: 565.30 KiB)
backward
7.067 ms (24412 allocations: 1.62 MiB)
forw and back
11.125 ms (39055 allocations: 2.46 MiB)
LSTM CPU c=128 n=32 ts=64
forward
959.431 μs (833 allocations: 1.43 MiB)
backward
3.159 ms (9069 allocations: 6.72 MiB)
forw and back
5.664 ms (21897 allocations: 8.62 MiB)
LSTM CUDA c=128 n=32 ts=64
forward
5.234 ms (20609 allocations: 1.10 MiB)
backward
13.496 ms (48796 allocations: 3.24 MiB)
forw and back
21.335 ms (77967 allocations: 4.91 MiB)
LSTM CPU c=128 n=128 ts=4
forward
311.561 μs (53 allocations: 342.64 KiB)
backward
1.003 ms (616 allocations: 1.16 MiB)
forw and back
1.899 ms (1499 allocations: 1.43 MiB)
LSTM CUDA c=128 n=128 ts=4
forward
476.439 μs (1297 allocations: 70.83 KiB)
backward
1.125 ms (3119 allocations: 209.00 KiB)
forw and back
1.769 ms (5049 allocations: 322.67 KiB)
LSTM CPU c=128 n=128 ts=16
forward
2.236 ms (209 allocations: 1.38 MiB)
backward
4.116 ms (2332 allocations: 4.72 MiB)
forw and back
7.460 ms (5591 allocations: 5.81 MiB)
LSTM CUDA c=128 n=128 ts=16
forward
1.438 ms (5185 allocations: 283.17 KiB)
backward
3.694 ms (12395 allocations: 833.92 KiB)
forw and back
5.800 ms (19773 allocations: 1.24 MiB)
LSTM CPU c=128 n=128 ts=32
forward
4.504 ms (417 allocations: 2.79 MiB)
backward
8.002 ms (4621 allocations: 9.46 MiB)
forw and back
14.503 ms (11049 allocations: 11.65 MiB)
LSTM CUDA c=128 n=128 ts=32
forward
2.700 ms (10369 allocations: 566.30 KiB)
backward
7.140 ms (24764 allocations: 1.63 MiB)
forw and back
11.057 ms (39407 allocations: 2.46 MiB)
LSTM CPU c=128 n=128 ts=64
forward
9.043 ms (833 allocations: 5.59 MiB)
backward
16.704 ms (9197 allocations: 18.94 MiB)
forw and back
29.730 ms (21961 allocations: 23.32 MiB)
LSTM CUDA c=128 n=128 ts=64
forward
5.205 ms (20737 allocations: 1.11 MiB)
backward
14.014 ms (49500 allocations: 3.26 MiB)
forw and back
21.858 ms (78671 allocations: 4.92 MiB)
LSTM CPU c=128 n=512 ts=4
forward
1.537 ms (64 allocations: 1.32 MiB)
backward
2.000 ms (636 allocations: 4.18 MiB)
forw and back
3.908 ms (1546 allocations: 5.01 MiB)
LSTM CUDA c=128 n=512 ts=4
forward
498.242 μs (1334 allocations: 71.66 KiB)
backward
1.193 ms (3147 allocations: 209.44 KiB)
forw and back
1.856 ms (5146 allocations: 325.08 KiB)
LSTM CPU c=128 n=512 ts=16
forward
6.646 ms (256 allocations: 5.46 MiB)
backward
8.319 ms (2412 allocations: 16.88 MiB)
forw and back
15.881 ms (5782 allocations: 20.35 MiB)
LSTM CUDA c=128 n=512 ts=16
forward
1.488 ms (5342 allocations: 286.62 KiB)
backward
3.926 ms (12519 allocations: 835.86 KiB)
forw and back
6.031 ms (20134 allocations: 1.24 MiB)
LSTM CPU c=128 n=512 ts=32
forward
13.529 ms (512 allocations: 10.98 MiB)
backward
16.747 ms (4781 allocations: 33.80 MiB)
forw and back
31.685 ms (11432 allocations: 40.81 MiB)
LSTM CUDA c=128 n=512 ts=32
forward
2.830 ms (10686 allocations: 573.25 KiB)
backward
7.573 ms (25016 allocations: 1.63 MiB)
forw and back
11.650 ms (40120 allocations: 2.48 MiB)
LSTM CPU c=128 n=512 ts=64
forward
27.174 ms (1024 allocations: 22.03 MiB)
backward
33.567 ms (9517 allocations: 67.64 MiB)
forw and back
63.216 ms (22728 allocations: 81.71 MiB)
LSTM CUDA c=128 n=512 ts=64
forward
5.410 ms (21374 allocations: 1.12 MiB)
backward
14.782 ms (50008 allocations: 3.26 MiB)
forw and back
22.877 ms (80088 allocations: 4.94 MiB)
LSTM CPU c=512 n=32 ts=4
forward
296.655 μs (53 allocations: 88.14 KiB)
backward
902.290 μs (623 allocations: 1.11 MiB)
forw and back
1.507 ms (1506 allocations: 1.04 MiB)
LSTM CUDA c=512 n=32 ts=4
forward
495.875 μs (1309 allocations: 71.02 KiB)
backward
1.152 ms (3107 allocations: 208.81 KiB)
forw and back
1.801 ms (5057 allocations: 322.80 KiB)
LSTM CPU c=512 n=32 ts=16
forward
1.234 ms (209 allocations: 364.23 KiB)
backward
3.671 ms (2363 allocations: 4.62 MiB)
forw and back
5.887 ms (5622 allocations: 4.34 MiB)
LSTM CUDA c=512 n=32 ts=16
forward
1.492 ms (5233 allocations: 283.92 KiB)
backward
3.738 ms (12347 allocations: 833.17 KiB)
forw and back
5.898 ms (19805 allocations: 1.24 MiB)
LSTM CPU c=512 n=32 ts=32
forward
2.483 ms (417 allocations: 732.36 KiB)
backward
7.368 ms (4684 allocations: 9.29 MiB)
forw and back
11.759 ms (11112 allocations: 8.75 MiB)
LSTM CUDA c=512 n=32 ts=32
forward
2.761 ms (10465 allocations: 567.80 KiB)
backward
7.149 ms (24668 allocations: 1.63 MiB)
forw and back
11.274 ms (39471 allocations: 2.46 MiB)
LSTM CPU c=512 n=32 ts=64
forward
5.011 ms (833 allocations: 1.43 MiB)
backward
12.625 ms (9324 allocations: 18.65 MiB)
forw and back
23.286 ms (22088 allocations: 17.56 MiB)
LSTM CUDA c=512 n=32 ts=64
forward
5.330 ms (20929 allocations: 1.11 MiB)
backward
13.971 ms (49308 allocations: 3.25 MiB)
forw and back
22.006 ms (78799 allocations: 4.92 MiB)
LSTM CPU c=512 n=128 ts=4
forward
598.161 μs (53 allocations: 342.64 KiB)
backward
1.382 ms (623 allocations: 2.99 MiB)
forw and back
2.212 ms (1506 allocations: 2.51 MiB)
LSTM CUDA c=512 n=128 ts=4
forward
529.644 μs (1317 allocations: 71.14 KiB)
backward
1.173 ms (3151 allocations: 209.50 KiB)
forw and back
1.868 ms (5101 allocations: 323.48 KiB)
LSTM CPU c=512 n=128 ts=16
forward
2.532 ms (209 allocations: 1.38 MiB)
backward
5.782 ms (2363 allocations: 12.17 MiB)
forw and back
8.975 ms (5622 allocations: 10.26 MiB)
LSTM CUDA c=512 n=128 ts=16
forward
1.601 ms (5265 allocations: 284.42 KiB)
backward
3.813 ms (12523 allocations: 835.92 KiB)
forw and back
6.066 ms (19981 allocations: 1.24 MiB)
LSTM CPU c=512 n=128 ts=32
forward
5.206 ms (417 allocations: 2.79 MiB)
backward
9.077 ms (4684 allocations: 24.41 MiB)
forw and back
17.836 ms (11112 allocations: 20.59 MiB)
LSTM CUDA c=512 n=128 ts=32
forward
2.990 ms (10529 allocations: 568.80 KiB)
backward
7.328 ms (25020 allocations: 1.63 MiB)
forw and back
11.564 ms (39823 allocations: 2.47 MiB)
LSTM CPU c=512 n=128 ts=64
forward
10.575 ms (833 allocations: 5.59 MiB)
backward
23.239 ms (9324 allocations: 48.88 MiB)
forw and back
35.565 ms (22088 allocations: 41.27 MiB)
LSTM CUDA c=512 n=128 ts=64
forward
5.804 ms (21057 allocations: 1.11 MiB)
backward
14.272 ms (50012 allocations: 3.26 MiB)
forw and back
22.629 ms (79503 allocations: 4.93 MiB)
LSTM CPU c=512 n=512 ts=4
forward
1.775 ms (64 allocations: 1.32 MiB)
backward
3.575 ms (643 allocations: 10.51 MiB)
forw and back
5.246 ms (1553 allocations: 8.34 MiB)
LSTM CUDA c=512 n=512 ts=4
forward
524.592 μs (1354 allocations: 71.97 KiB)
backward
1.235 ms (3179 allocations: 209.94 KiB)
forw and back
1.906 ms (5198 allocations: 325.89 KiB)
LSTM CPU c=512 n=512 ts=16
forward
7.986 ms (256 allocations: 5.46 MiB)
backward
14.480 ms (2443 allocations: 42.33 MiB)
forw and back
20.785 ms (5813 allocations: 33.80 MiB)
LSTM CUDA c=512 n=512 ts=16
forward
1.586 ms (5422 allocations: 287.88 KiB)
backward
4.138 ms (12647 allocations: 837.86 KiB)
forw and back
6.273 ms (20342 allocations: 1.25 MiB)
LSTM CPU c=512 n=512 ts=32
forward
16.118 ms (512 allocations: 10.98 MiB)
backward
29.159 ms (4844 allocations: 84.75 MiB)
forw and back
41.383 ms (11495 allocations: 67.75 MiB)
LSTM CUDA c=512 n=512 ts=32
forward
3.017 ms (10846 allocations: 575.75 KiB)
backward
7.970 ms (25272 allocations: 1.64 MiB)
forw and back
12.031 ms (40536 allocations: 2.48 MiB)
LSTM CPU c=512 n=512 ts=64
forward
32.323 ms (1024 allocations: 22.03 MiB)
backward
58.327 ms (9644 allocations: 169.59 MiB)
forw and back
82.442 ms (22855 allocations: 135.66 MiB)
LSTM CUDA c=512 n=512 ts=64
forward
5.773 ms (21694 allocations: 1.12 MiB)
backward
16.019 ms (50520 allocations: 3.27 MiB)
forw and back
23.525 ms (80920 allocations: 4.96 MiB)
GRU CPU c=32 n=32 ts=4
forward
37.056 μs (49 allocations: 61.70 KiB)
backward
171.009 μs (844 allocations: 246.17 KiB)
forw and back
350.779 μs (1737 allocations: 428.23 KiB)
GRU CUDA c=32 n=32 ts=4
forward
486.435 μs (1429 allocations: 82.20 KiB)
backward
1.490 ms (3466 allocations: 213.53 KiB)
forw and back
2.230 ms (5809 allocations: 364.19 KiB)
GRU CPU c=32 n=32 ts=16
forward
164.148 μs (193 allocations: 264.30 KiB)
backward
672.080 μs (3149 allocations: 1.00 MiB)
forw and back
1.278 ms (6479 allocations: 1.72 MiB)
GRU CUDA c=32 n=32 ts=16
forward
1.452 ms (5713 allocations: 328.67 KiB)
backward
4.948 ms (13607 allocations: 844.50 KiB)
forw and back
7.461 ms (22671 allocations: 1.39 MiB)
GRU CPU c=32 n=32 ts=32
forward
338.151 μs (385 allocations: 534.42 KiB)
backward
1.315 ms (6221 allocations: 2.02 MiB)
forw and back
2.501 ms (12799 allocations: 3.44 MiB)
GRU CUDA c=32 n=32 ts=32
forward
2.712 ms (11425 allocations: 657.30 KiB)
backward
9.540 ms (27127 allocations: 1.65 MiB)
forw and back
14.249 ms (45151 allocations: 2.77 MiB)
GRU CPU c=32 n=32 ts=64
forward
682.412 μs (769 allocations: 1.05 MiB)
backward
2.647 ms (12365 allocations: 4.06 MiB)
forw and back
4.982 ms (25439 allocations: 6.90 MiB)
GRU CUDA c=32 n=32 ts=64
forward
5.276 ms (22849 allocations: 1.28 MiB)
backward
18.634 ms (54167 allocations: 3.29 MiB)
forw and back
27.950 ms (90111 allocations: 5.54 MiB)
GRU CPU c=32 n=128 ts=4
forward
145.167 μs (49 allocations: 238.02 KiB)
backward
298.838 μs (844 allocations: 747.30 KiB)
forw and back
595.941 μs (1737 allocations: 1.15 MiB)
GRU CUDA c=32 n=128 ts=4
forward
475.030 μs (1437 allocations: 82.33 KiB)
backward
1.497 ms (3502 allocations: 214.09 KiB)
forw and back
2.254 ms (5845 allocations: 364.75 KiB)
GRU CPU c=32 n=128 ts=16
forward
616.850 μs (193 allocations: 1.00 MiB)
backward
1.153 ms (3149 allocations: 3.09 MiB)
forw and back
2.215 ms (6479 allocations: 4.83 MiB)
GRU CUDA c=32 n=128 ts=16
forward
1.444 ms (5745 allocations: 329.17 KiB)
backward
5.056 ms (13751 allocations: 846.75 KiB)
forw and back
7.498 ms (22815 allocations: 1.39 MiB)
GRU CPU c=32 n=128 ts=32
forward
1.242 ms (385 allocations: 2.02 MiB)
backward
2.346 ms (6221 allocations: 6.23 MiB)
forw and back
4.434 ms (12799 allocations: 9.73 MiB)
GRU CUDA c=32 n=128 ts=32
forward
2.699 ms (11489 allocations: 658.30 KiB)
backward
9.703 ms (27415 allocations: 1.65 MiB)
forw and back
14.495 ms (45439 allocations: 2.78 MiB)
GRU CPU c=32 n=128 ts=64
forward
2.507 ms (769 allocations: 4.07 MiB)
backward
4.859 ms (12365 allocations: 12.51 MiB)
forw and back
8.842 ms (25439 allocations: 19.54 MiB)
GRU CUDA c=32 n=128 ts=64
forward
5.261 ms (22977 allocations: 1.29 MiB)
backward
18.872 ms (54743 allocations: 3.30 MiB)
forw and back
28.257 ms (90687 allocations: 5.55 MiB)
GRU CPU c=32 n=512 ts=4
forward
1.406 ms (56 allocations: 933.47 KiB)
backward
1.806 ms (880 allocations: 2.67 MiB)
forw and back
3.487 ms (1788 allocations: 4.06 MiB)
GRU CUDA c=32 n=512 ts=4
forward
462.937 μs (1458 allocations: 82.66 KiB)
backward
1.541 ms (3594 allocations: 217.84 KiB)
forw and back
2.293 ms (5998 allocations: 370.22 KiB)
GRU CPU c=32 n=512 ts=16
forward
5.263 ms (224 allocations: 3.93 MiB)
backward
7.605 ms (3305 allocations: 11.35 MiB)
forw and back
14.322 ms (6698 allocations: 17.15 MiB)
GRU CUDA c=32 n=512 ts=16
forward
1.437 ms (5838 allocations: 330.62 KiB)
backward
5.297 ms (14131 allocations: 861.94 KiB)
forw and back
7.771 ms (23400 allocations: 1.41 MiB)
GRU CPU c=32 n=512 ts=32
forward
11.747 ms (448 allocations: 7.95 MiB)
backward
15.211 ms (6537 allocations: 22.91 MiB)
forw and back
28.691 ms (13242 allocations: 34.60 MiB)
GRU CUDA c=32 n=512 ts=32
forward
2.719 ms (11678 allocations: 661.25 KiB)
backward
10.270 ms (28179 allocations: 1.68 MiB)
forw and back
15.028 ms (46600 allocations: 2.82 MiB)
GRU CPU c=32 n=512 ts=64
forward
23.613 ms (896 allocations: 15.99 MiB)
backward
30.489 ms (13001 allocations: 46.05 MiB)
forw and back
56.960 ms (26330 allocations: 69.50 MiB)
GRU CUDA c=32 n=512 ts=64
forward
5.290 ms (23358 allocations: 1.29 MiB)
backward
20.239 ms (56275 allocations: 3.36 MiB)
forw and back
30.735 ms (93000 allocations: 5.62 MiB)
GRU CPU c=128 n=32 ts=4
forward
48.161 μs (49 allocations: 61.70 KiB)
backward
215.723 μs (844 allocations: 405.17 KiB)
forw and back
408.124 μs (1737 allocations: 539.23 KiB)
GRU CUDA c=128 n=32 ts=4
forward
482.053 μs (1429 allocations: 82.20 KiB)
backward
1.496 ms (3466 allocations: 213.53 KiB)
forw and back
2.217 ms (5809 allocations: 364.19 KiB)
GRU CPU c=128 n=32 ts=16
forward
209.448 μs (193 allocations: 264.30 KiB)
backward
795.439 μs (3149 allocations: 1.65 MiB)
forw and back
1.444 ms (6479 allocations: 2.18 MiB)
GRU CUDA c=128 n=32 ts=16
forward
1.489 ms (5713 allocations: 328.67 KiB)
backward
4.985 ms (13607 allocations: 844.50 KiB)
forw and back
7.435 ms (22671 allocations: 1.39 MiB)
GRU CPU c=128 n=32 ts=32
forward
426.944 μs (385 allocations: 534.42 KiB)
backward
1.595 ms (6221 allocations: 3.33 MiB)
forw and back
2.840 ms (12799 allocations: 4.37 MiB)
GRU CUDA c=128 n=32 ts=32
forward
2.745 ms (11425 allocations: 657.30 KiB)
backward
9.446 ms (27127 allocations: 1.65 MiB)
forw and back
14.175 ms (45151 allocations: 2.77 MiB)
GRU CPU c=128 n=32 ts=64
forward
859.223 μs (769 allocations: 1.05 MiB)
backward
3.385 ms (12365 allocations: 6.68 MiB)
forw and back
5.770 ms (25439 allocations: 8.77 MiB)
GRU CUDA c=128 n=32 ts=64
forward
5.302 ms (22849 allocations: 1.28 MiB)
backward
18.317 ms (54167 allocations: 3.29 MiB)
forw and back
27.563 ms (90111 allocations: 5.54 MiB)
GRU CPU c=128 n=128 ts=4
forward
494.162 μs (49 allocations: 238.02 KiB)
backward
1.078 ms (852 allocations: 1.17 MiB)
forw and back
1.902 ms (1741 allocations: 1.40 MiB)
GRU CUDA c=128 n=128 ts=4
forward
465.380 μs (1437 allocations: 82.33 KiB)
backward
1.477 ms (3502 allocations: 214.09 KiB)
forw and back
2.239 ms (5845 allocations: 364.75 KiB)
GRU CPU c=128 n=128 ts=16
forward
2.125 ms (193 allocations: 1.00 MiB)
backward
4.355 ms (3181 allocations: 4.86 MiB)
forw and back
7.485 ms (6495 allocations: 5.85 MiB)
GRU CUDA c=128 n=128 ts=16
forward
1.477 ms (5745 allocations: 329.17 KiB)
backward
5.129 ms (13751 allocations: 846.75 KiB)
forw and back
7.632 ms (22815 allocations: 1.39 MiB)
GRU CPU c=128 n=128 ts=32
forward
4.230 ms (385 allocations: 2.02 MiB)
backward
8.788 ms (6285 allocations: 9.78 MiB)
forw and back
14.835 ms (12831 allocations: 11.78 MiB)
GRU CUDA c=128 n=128 ts=32
forward
2.758 ms (11489 allocations: 658.30 KiB)
backward
9.839 ms (27415 allocations: 1.65 MiB)
forw and back
14.647 ms (45439 allocations: 2.78 MiB)
GRU CPU c=128 n=128 ts=64
forward
8.546 ms (769 allocations: 4.07 MiB)
backward
17.662 ms (12493 allocations: 19.62 MiB)
forw and back
30.030 ms (25503 allocations: 23.65 MiB)
GRU CUDA c=128 n=128 ts=64
forward
5.346 ms (22977 allocations: 1.29 MiB)
backward
19.017 ms (54743 allocations: 3.30 MiB)
forw and back
28.545 ms (90687 allocations: 5.55 MiB)
GRU CPU c=128 n=512 ts=4
forward
1.467 ms (56 allocations: 933.47 KiB)
backward
2.078 ms (880 allocations: 4.23 MiB)
forw and back
3.707 ms (1788 allocations: 4.87 MiB)
GRU CUDA c=128 n=512 ts=4
forward
471.027 μs (1458 allocations: 82.66 KiB)
backward
1.575 ms (3594 allocations: 217.84 KiB)
forw and back
2.332 ms (5998 allocations: 370.22 KiB)
GRU CPU c=128 n=512 ts=16
forward
6.161 ms (224 allocations: 3.93 MiB)
backward
8.740 ms (3305 allocations: 17.62 MiB)
forw and back
15.252 ms (6698 allocations: 20.42 MiB)
GRU CUDA c=128 n=512 ts=16
forward
1.481 ms (5838 allocations: 330.62 KiB)
backward
5.411 ms (14131 allocations: 861.94 KiB)
forw and back
7.896 ms (23400 allocations: 1.41 MiB)
GRU CPU c=128 n=512 ts=32
forward
10.098 ms (448 allocations: 7.95 MiB)
backward
17.682 ms (6537 allocations: 35.47 MiB)
forw and back
30.554 ms (13242 allocations: 41.15 MiB)
GRU CUDA c=128 n=512 ts=32
forward
2.800 ms (11678 allocations: 661.25 KiB)
backward
10.446 ms (28179 allocations: 1.68 MiB)
forw and back
15.398 ms (46600 allocations: 2.82 MiB)
GRU CPU c=128 n=512 ts=64
forward
25.142 ms (896 allocations: 15.99 MiB)
backward
35.435 ms (13001 allocations: 71.16 MiB)
forw and back
60.727 ms (26330 allocations: 82.62 MiB)
GRU CUDA c=128 n=512 ts=64
forward
5.479 ms (23358 allocations: 1.29 MiB)
backward
20.468 ms (56275 allocations: 3.36 MiB)
forw and back
30.247 ms (93000 allocations: 5.62 MiB)
GRU CPU c=512 n=32 ts=4
forward
311.081 μs (49 allocations: 61.70 KiB)
backward
937.930 μs (859 allocations: 1.02 MiB)
forw and back
1.183 ms (1748 allocations: 982.38 KiB)
GRU CUDA c=512 n=32 ts=4
forward
500.746 μs (1449 allocations: 82.52 KiB)
backward
1.524 ms (3498 allocations: 214.03 KiB)
forw and back
2.272 ms (5861 allocations: 365.00 KiB)
GRU CPU c=512 n=32 ts=16
forward
1.297 ms (193 allocations: 264.30 KiB)
backward
3.842 ms (3212 allocations: 4.24 MiB)
forw and back
6.129 ms (6526 allocations: 4.01 MiB)
GRU CUDA c=512 n=32 ts=16
forward
1.538 ms (5793 allocations: 329.92 KiB)
backward
5.103 ms (13735 allocations: 846.50 KiB)
forw and back
7.769 ms (22879 allocations: 1.40 MiB)
GRU CPU c=512 n=32 ts=32
forward
2.623 ms (385 allocations: 534.42 KiB)
backward
7.902 ms (6348 allocations: 8.53 MiB)
forw and back
12.426 ms (12894 allocations: 8.08 MiB)
GRU CUDA c=512 n=32 ts=32
forward
2.858 ms (11585 allocations: 659.80 KiB)
backward
9.831 ms (27383 allocations: 1.65 MiB)
forw and back
14.704 ms (45567 allocations: 2.78 MiB)
GRU CPU c=512 n=32 ts=64
forward
5.302 ms (769 allocations: 1.05 MiB)
backward
15.853 ms (12620 allocations: 17.12 MiB)
forw and back
24.895 ms (25630 allocations: 16.22 MiB)
GRU CUDA c=512 n=32 ts=64
forward
5.513 ms (23169 allocations: 1.29 MiB)
backward
18.977 ms (54679 allocations: 3.30 MiB)
forw and back
28.441 ms (90943 allocations: 5.55 MiB)
GRU CPU c=512 n=128 ts=4
forward
587.296 μs (49 allocations: 238.02 KiB)
backward
1.421 ms (859 allocations: 2.91 MiB)
forw and back
2.214 ms (1748 allocations: 2.40 MiB)
GRU CUDA c=512 n=128 ts=4
forward
490.172 μs (1457 allocations: 82.64 KiB)
backward
1.532 ms (3534 allocations: 214.59 KiB)
forw and back
2.317 ms (5897 allocations: 365.56 KiB)
GRU CPU c=512 n=128 ts=16
forward
2.443 ms (193 allocations: 1.00 MiB)
backward
6.062 ms (3212 allocations: 11.94 MiB)
forw and back
8.953 ms (6526 allocations: 9.94 MiB)
GRU CUDA c=512 n=128 ts=16
forward
1.482 ms (5825 allocations: 330.42 KiB)
backward
5.141 ms (13879 allocations: 848.75 KiB)
forw and back
7.694 ms (23023 allocations: 1.40 MiB)
GRU CPU c=512 n=128 ts=32
forward
4.997 ms (385 allocations: 2.02 MiB)
backward
12.083 ms (6348 allocations: 23.99 MiB)
forw and back
17.802 ms (12894 allocations: 19.99 MiB)
GRU CUDA c=512 n=128 ts=32
forward
2.768 ms (11649 allocations: 660.80 KiB)
backward
9.899 ms (27671 allocations: 1.65 MiB)
forw and back
14.722 ms (45855 allocations: 2.79 MiB)
GRU CPU c=512 n=128 ts=64
forward
10.132 ms (769 allocations: 4.07 MiB)
backward
24.323 ms (12620 allocations: 48.07 MiB)
forw and back
35.536 ms (25630 allocations: 40.10 MiB)
GRU CUDA c=512 n=128 ts=64
forward
5.351 ms (23297 allocations: 1.29 MiB)
backward
19.270 ms (55255 allocations: 3.31 MiB)
forw and back
28.835 ms (91519 allocations: 5.56 MiB)
GRU CPU c=512 n=512 ts=4
forward
1.769 ms (56 allocations: 933.47 KiB)
backward
3.704 ms (887 allocations: 10.48 MiB)
forw and back
5.138 ms (1795 allocations: 8.11 MiB)
GRU CUDA c=512 n=512 ts=4
forward
522.495 μs (1478 allocations: 82.97 KiB)
backward
1.630 ms (3626 allocations: 218.34 KiB)
forw and back
2.422 ms (6050 allocations: 371.03 KiB)
GRU CPU c=512 n=512 ts=16
forward
7.603 ms (224 allocations: 3.93 MiB)
backward
14.938 ms (3336 allocations: 42.71 MiB)
forw and back
20.314 ms (6729 allocations: 33.50 MiB)
GRU CUDA c=512 n=512 ts=16
forward
1.633 ms (5918 allocations: 331.88 KiB)
backward
5.625 ms (14259 allocations: 863.94 KiB)
forw and back
8.189 ms (23608 allocations: 1.42 MiB)
GRU CPU c=512 n=512 ts=32
forward
15.317 ms (448 allocations: 7.95 MiB)
backward
30.130 ms (6600 allocations: 85.68 MiB)
forw and back
40.710 ms (13305 allocations: 67.36 MiB)
GRU CUDA c=512 n=512 ts=32
forward
3.070 ms (11838 allocations: 663.75 KiB)
backward
10.983 ms (28435 allocations: 1.68 MiB)
forw and back
15.912 ms (47016 allocations: 2.82 MiB)
GRU CPU c=512 n=512 ts=64
forward
30.749 ms (896 allocations: 15.99 MiB)
backward
60.205 ms (13128 allocations: 171.62 MiB)
forw and back
80.993 ms (26457 allocations: 135.07 MiB)
GRU CUDA c=512 n=512 ts=64
forward
6.003 ms (23678 allocations: 1.30 MiB)
backward
22.428 ms (56787 allocations: 3.37 MiB)
forw and back
31.418 ms (93832 allocations: 5.64 MiB)
branch rnn_type device features batch_size timesteps passes time_μs alloc_num alloc_KiB
master LSTM CPU 32 32 4 forward 39.939998626709 53 88.1399993896484
master LSTM CPU 32 32 4 backward 156.791000366211 608 233.699996948242
master LSTM CPU 32 32 4 forw and back 334.136993408203 1495 405.920013427734
master LSTM CUDA 32 32 4 forward 443.264007568359 1289 70.6999969482422
master LSTM CUDA 32 32 4 backward 1068.0 3075 208.309997558594
master LSTM CUDA 32 32 4 forw and back 1699.0 5005 321.980010986328
master LSTM CPU 32 32 16 forward 176.345993041992 209 364.230010986328
master LSTM CPU 32 32 16 backward 587.888977050781 2300 950.72998046875
master LSTM CPU 32 32 16 forw and back 1216.0 5575 1590.0
master LSTM CUDA 32 32 16 forward 1384.0 5153 282.670013427734
master LSTM CUDA 32 32 16 backward 3591.0 12219 831.169982910156
master LSTM CUDA 32 32 16 forw and back 5630.0 19597 1230.0
master LSTM CPU 32 32 32 forward 361.653015136719 417 732.359985351562
master LSTM CPU 32 32 32 backward 1169.0 4557 1860.0
master LSTM CPU 32 32 32 forw and back 2377.0 11017 3190.0
master LSTM CUDA 32 32 32 forward 2631.0 10305 565.299987792969
master LSTM CUDA 32 32 32 backward 6895.0 24412 1620.0
master LSTM CUDA 32 32 32 forw and back 10829.0 39055 2460.0
master LSTM CPU 32 32 64 forward 734.004028320312 833 1430.0
master LSTM CPU 32 32 64 backward 2354.0 9069 3730.0
master LSTM CPU 32 32 64 forw and back 4719.0 21897 6380.0
master LSTM CUDA 32 32 64 forward 5097.0 20609 1100.0
master LSTM CUDA 32 32 64 backward 13517.0 48796 3240.0
master LSTM CUDA 32 32 64 forw and back 21213.0 77967 4910.0
master LSTM CPU 32 128 4 forward 155.343994140625 53 342.640014648437
master LSTM CPU 32 128 4 backward 275.822998046875 608 722.830017089844
master LSTM CPU 32 128 4 forw and back 600.094970703125 1495 1160.0
master LSTM CUDA 32 128 4 forward 462.463989257812 1297 70.8300018310547
master LSTM CUDA 32 128 4 backward 1115.0 3119 209.0
master LSTM CUDA 32 128 4 forw and back 1754.0 5049 322.670013427734
master LSTM CPU 32 128 16 forward 651.596984863281 209 1380.0
master LSTM CPU 32 128 16 backward 1069.0 2300 2860.0
master LSTM CPU 32 128 16 forw and back 2247.0 5575 4700.0
master LSTM CUDA 32 128 16 forward 1414.0 5185 283.170013427734
master LSTM CUDA 32 128 16 backward 3729.0 12395 833.919982910156
master LSTM CUDA 32 128 16 forw and back 5785.0 19773 1240.0
master LSTM CPU 32 128 32 forward 1309.0 417 2790.0
master LSTM CPU 32 128 32 backward 2192.0 4557 5730.0
master LSTM CPU 32 128 32 forw and back 4490.0 11017 9410.0
master LSTM CUDA 32 128 32 forward 2656.0 10369 566.299987792969
master LSTM CUDA 32 128 32 backward 7144.0 24764 1630.0
master LSTM CUDA 32 128 32 forw and back 11062.0 39407 2460.0
master LSTM CPU 32 128 64 forward 2630.0 833 5590.0
master LSTM CPU 32 128 64 backward 4550.0 9069 11460.0
master LSTM CPU 32 128 64 forw and back 8997.0 21897 18840.0
master LSTM CUDA 32 128 64 forward 5159.0 20737 1110.0
master LSTM CUDA 32 128 64 backward 13942.0 49500 3260.0
master LSTM CUDA 32 128 64 forw and back 21668.0 78671 4920.0
master LSTM CPU 32 512 4 forward 1488.0 64 1320.0
master LSTM CPU 32 512 4 backward 1690.0 636 2600.0
master LSTM CPU 32 512 4 forw and back 3664.0 1546 4180.0
master LSTM CUDA 32 512 4 forward 461.363006591797 1334 71.6600036621094
master LSTM CUDA 32 512 4 backward 1166.0 3147 209.440002441406
master LSTM CUDA 32 512 4 forw and back 1813.0 5146 325.079986572266
master LSTM CPU 32 512 16 forward 6258.0 256 5460.0
master LSTM CPU 32 512 16 backward 7021.0 2412 10510.0
master LSTM CPU 32 512 16 forw and back 14799.0 5782 16990.0
master LSTM CUDA 32 512 16 forward 1418.0 5342 286.619995117187
master LSTM CUDA 32 512 16 backward 3877.0 12519 835.859985351563
master LSTM CUDA 32 512 16 forw and back 6108.0 20134 1240.0
master LSTM CPU 32 512 32 forward 12599.0 512 10980.0
master LSTM CPU 32 512 32 backward 14229.0 4781 21060.0
master LSTM CPU 32 512 32 forw and back 29445.0 11432 34070.0
master LSTM CUDA 32 512 32 forward 2690.0 10686 573.25
master LSTM CUDA 32 512 32 backward 7451.0 25016 1630.0
master LSTM CUDA 32 512 32 forw and back 11705.0 40120 2480.0
master LSTM CPU 32 512 64 forward 25296.0 1024 22030.0
master LSTM CPU 32 512 64 backward 28496.0 9517 42150.0
master LSTM CPU 32 512 64 forw and back 58946.0 22728 68230.0
master LSTM CUDA 32 512 64 forward 5197.0 21374 1120.0
master LSTM CUDA 32 512 64 backward 14618.0 50008 3260.0
master LSTM CUDA 32 512 64 forw and back 23281.0 80088 4940.0
master LSTM CPU 128 32 4 forward 54.4169998168945 53 88.1399993896484
master LSTM CPU 128 32 4 backward 199.080001831055 608 413.700012207031
master LSTM CPU 128 32 4 forw and back 399.412994384766 1495 537.919982910156
master LSTM CUDA 128 32 4 forward 480.368011474609 1289 70.6999969482422
master LSTM CUDA 128 32 4 backward 1125.0 3075 208.309997558594
master LSTM CUDA 128 32 4 forw and back 1774.0 5005 321.980010986328
master LSTM CPU 128 32 16 forward 232.804992675781 209 364.230010986328
master LSTM CPU 128 32 16 backward 751.56201171875 2300 1670.0
master LSTM CPU 128 32 16 forw and back 1429.0 5575 2140.0
master LSTM CUDA 128 32 16 forward 1463.0 5153 282.670013427734
master LSTM CUDA 128 32 16 backward 3738.0 12219 831.169982910156
master LSTM CUDA 128 32 16 forw and back 5875.0 19597 1230.0
master LSTM CPU 128 32 32 forward 475.282989501953 417 732.359985351562
master LSTM CPU 128 32 32 backward 1525.0 4557 3350.0
master LSTM CPU 128 32 32 forw and back 2822.0 11017 4300.0
master LSTM CUDA 128 32 32 forward 2733.0 10305 565.299987792969
master LSTM CUDA 128 32 32 backward 7187.0 24412 1620.0
master LSTM CUDA 128 32 32 forw and back 11243.0 39055 2460.0
master LSTM CPU 128 32 64 forward 960.460998535156 833 1430.0
master LSTM CPU 128 32 64 backward 3211.0 9069 6720.0
master LSTM CPU 128 32 64 forw and back 5705.0 21897 8620.0
master LSTM CUDA 128 32 64 forward 5253.0 20609 1100.0
master LSTM CUDA 128 32 64 backward 13694.0 48796 3240.0
master LSTM CUDA 128 32 64 forw and back 21541.0 77967 4910.0
master LSTM CPU 128 128 4 forward 531.2509765625 53 342.640014648437
master LSTM CPU 128 128 4 backward 1021.0 616 1160.0
master LSTM CPU 128 128 4 forw and back 1912.0 1499 1430.0
master LSTM CUDA 128 128 4 forward 472.183013916016 1297 70.8300018310547
master LSTM CUDA 128 128 4 backward 1133.0 3119 209.0
master LSTM CUDA 128 128 4 forw and back 1789.0 5049 322.670013427734
master LSTM CPU 128 128 16 forward 1185.0 209 1380.0
master LSTM CPU 128 128 16 backward 4145.0 2332 4720.0
master LSTM CPU 128 128 16 forw and back 7534.0 5591 5810.0
master LSTM CUDA 128 128 16 forward 1443.0 5185 283.170013427734
master LSTM CUDA 128 128 16 backward 3749.0 12395 833.919982910156
master LSTM CUDA 128 128 16 forw and back 5830.0 19773 1240.0
master LSTM CPU 128 128 32 forward 4505.0 417 2790.0
master LSTM CPU 128 128 32 backward 8352.0 4621 9460.0
master LSTM CPU 128 128 32 forw and back 15025.0 11049 11650.0
master LSTM CUDA 128 128 32 forward 2710.0 10369 566.299987792969
master LSTM CUDA 128 128 32 backward 7201.0 24764 1630.0
master LSTM CUDA 128 128 32 forw and back 11155.0 39407 2460.0
master LSTM CPU 128 128 64 forward 9102.0 833 5590.0
master LSTM CPU 128 128 64 backward 16729.0 9197 18940.0
master LSTM CPU 128 128 64 forw and back 30072.0 21961 23320.0
master LSTM CUDA 128 128 64 forward 5187.0 20737 1110.0
master LSTM CUDA 128 128 64 backward 14091.0 49500 3260.0
master LSTM CUDA 128 128 64 forw and back 21876.0 78671 4920.0
master LSTM CPU 128 512 4 forward 1558.0 64 1320.0
master LSTM CPU 128 512 4 backward 1252.0 636 4180.0
master LSTM CPU 128 512 4 forw and back 3926.0 1546 5010.0
master LSTM CUDA 128 512 4 forward 496.510009765625 1334 71.6600036621094
master LSTM CUDA 128 512 4 backward 1194.0 3147 209.440002441406
master LSTM CUDA 128 512 4 forw and back 1863.0 5146 325.079986572266
master LSTM CPU 128 512 16 forward 6626.0 256 5460.0
master LSTM CPU 128 512 16 backward 7625.0 2412 16880.0
master LSTM CPU 128 512 16 forw and back 15896.0 5782 20350.0
master LSTM CUDA 128 512 16 forward 1492.0 5342 286.619995117187
master LSTM CUDA 128 512 16 backward 3982.0 12519 835.859985351563
master LSTM CUDA 128 512 16 forw and back 6097.0 20134 1240.0
master LSTM CPU 128 512 32 forward 12597.0 512 10980.0
master LSTM CPU 128 512 32 backward 16735.0 4781 33800.0
master LSTM CPU 128 512 32 forw and back 20492.0 11432 40810.0
master LSTM CUDA 128 512 32 forward 2865.0 10686 573.25
master LSTM CUDA 128 512 32 backward 7670.0 25016 1630.0
master LSTM CUDA 128 512 32 forw and back 11737.0 40120 2480.0
master LSTM CPU 128 512 64 forward 27121.0 1024 22030.0
master LSTM CPU 128 512 64 backward 33522.0 9517 67640.0
master LSTM CPU 128 512 64 forw and back 63381.0 22728 81710.0
master LSTM CUDA 128 512 64 forward 5428.0 21374 1120.0
master LSTM CUDA 128 512 64 backward 14902.0 50008 3260.0
master LSTM CUDA 128 512 64 forw and back 23089.0 80088 4940.0
master LSTM CPU 512 32 4 forward 299.046997070312 53 88.1399993896484
master LSTM CPU 512 32 4 backward 904.810974121094 623 1110.0
master LSTM CPU 512 32 4 forw and back 1529.0 1506 1040.0
master LSTM CUDA 512 32 4 forward 494.730987548828 1309 71.0199966430664
master LSTM CUDA 512 32 4 backward 1159.0 3107 208.809997558594
master LSTM CUDA 512 32 4 forw and back 1807.0 5057 322.799987792969
master LSTM CPU 512 32 16 forward 1239.0 209 364.230010986328
master LSTM CPU 512 32 16 backward 3695.0 2363 4620.0
master LSTM CPU 512 32 16 forw and back 5932.0 5622 4340.0
master LSTM CUDA 512 32 16 forward 1487.0 5233 283.920013427734
master LSTM CUDA 512 32 16 backward 3770.0 12347 833.169982910156
master LSTM CUDA 512 32 16 forw and back 5973.0 19805 1240.0
master LSTM CPU 512 32 32 forward 2487.0 417 732.359985351562
master LSTM CPU 512 32 32 backward 7416.0 4684 9290.0
master LSTM CPU 512 32 32 forw and back 11767.0 11112 8750.0
master LSTM CUDA 512 32 32 forward 2779.0 10465 567.799987792969
master LSTM CUDA 512 32 32 backward 7274.0 24668 1630.0
master LSTM CUDA 512 32 32 forw and back 11415.0 39471 2460.0
master LSTM CPU 512 32 64 forward 4289.0 833 1430.0
master LSTM CPU 512 32 64 backward 14788.0 9324 18650.0
master LSTM CPU 512 32 64 forw and back 23512.0 22088 17560.0
master LSTM CUDA 512 32 64 forward 5363.0 20929 1110.0
master LSTM CUDA 512 32 64 backward 14163.0 49308 3250.0
master LSTM CUDA 512 32 64 forw and back 22199.0 78799 4920.0
master LSTM CPU 512 128 4 forward 603.39599609375 53 342.640014648437
master LSTM CPU 512 128 4 backward 1379.0 623 2990.0
master LSTM CPU 512 128 4 forw and back 2237.0 1506 2510.0
master LSTM CUDA 512 128 4 forward 527.679992675781 1317 71.1399993896484
master LSTM CUDA 512 128 4 backward 1188.0 3151 209.5
master LSTM CUDA 512 128 4 forw and back 1904.0 5101 323.480010986328
master LSTM CPU 512 128 16 forward 2544.0 209 1380.0
master LSTM CPU 512 128 16 backward 5842.0 2363 12170.0
master LSTM CPU 512 128 16 forw and back 9017.0 5622 10260.0
master LSTM CUDA 512 128 16 forward 1623.0 5265 284.420013427734
master LSTM CUDA 512 128 16 backward 3877.0 12523 835.919982910156
master LSTM CUDA 512 128 16 forw and back 6127.0 19981 1240.0
master LSTM CPU 512 128 32 forward 5252.0 417 2790.0
master LSTM CPU 512 128 32 backward 11673.0 4684 24410.0
master LSTM CPU 512 128 32 forw and back 17942.0 11112 20590.0
master LSTM CUDA 512 128 32 forward 3000.0 10529 568.799987792969
master LSTM CUDA 512 128 32 backward 7391.0 25020 1630.0
master LSTM CUDA 512 128 32 forw and back 11691.0 39823 2470.0
master LSTM CPU 512 128 64 forward 10627.0 833 5590.0
master LSTM CPU 512 128 64 backward 23273.0 9324 48880.0
master LSTM CPU 512 128 64 forw and back 35678.0 22088 41270.0
master LSTM CUDA 512 128 64 forward 5809.0 21057 1110.0
master LSTM CUDA 512 128 64 backward 14461.0 50012 3260.0
master LSTM CUDA 512 128 64 forw and back 22961.0 79503 4930.0
master LSTM CPU 512 512 4 forward 1885.0 64 1320.0
master LSTM CPU 512 512 4 backward 3582.0 643 10510.0
master LSTM CPU 512 512 4 forw and back 5252.0 1553 8340.0
master LSTM CUDA 512 512 4 forward 526.255004882812 1354 71.9700012207031
master LSTM CUDA 512 512 4 backward 1243.0 3179 209.940002441406
master LSTM CUDA 512 512 4 forw and back 1923.0 5198 325.890014648437
master LSTM CPU 512 512 16 forward 8036.99951171875 256 5460.0
master LSTM CPU 512 512 16 backward 14470.0 2443 42330.0
master LSTM CPU 512 512 16 forw and back 20734.0 5813 33800.0
master LSTM CUDA 512 512 16 forward 1580.0 5422 287.880004882812
master LSTM CUDA 512 512 16 backward 4161.0 12647 837.859985351563
master LSTM CUDA 512 512 16 forw and back 6324.0 20342 1250.0
master LSTM CPU 512 512 32 forward 16143.9990234375 512 10980.0
master LSTM CPU 512 512 32 backward 28997.0 4844 84750.0
master LSTM CPU 512 512 32 forw and back 41463.0 11495 67750.0
master LSTM CUDA 512 512 32 forward 3018.0 10846 575.75
master LSTM CUDA 512 512 32 backward 8192.0 25272 1640.0
master LSTM CUDA 512 512 32 forw and back 12107.0 40536 2480.0
master LSTM CPU 512 512 64 forward 32463.001953125 1024 22030.0
master LSTM CPU 512 512 64 backward 58051.0 9644 169590.0
master LSTM CPU 512 512 64 forw and back 82756.0 22855 135660.0
master LSTM CUDA 512 512 64 forward 5804.0 21694 1120.0
master LSTM CUDA 512 512 64 backward 16487.0 50520 3270.0
master LSTM CUDA 512 512 64 forw and back 23959.0 80920 4960.0
master GRU CPU 32 32 4 forward 36.9179992675781 49 61.7000007629395
master GRU CPU 32 32 4 backward 168.076995849609 844 246.169998168945
master GRU CPU 32 32 4 forw and back 357.493011474609 1737 428.230010986328
master GRU CUDA 32 32 4 forward 488.684997558594 1429 82.1999969482422
master GRU CUDA 32 32 4 backward 1466.0 3466 213.529998779297
master GRU CUDA 32 32 4 forw and back 2199.0 5809 364.190002441406
master GRU CPU 32 32 16 forward 164.274993896484 193 264.299987792969
master GRU CPU 32 32 16 backward 642.880981445312 3149 1000.0
master GRU CPU 32 32 16 forw and back 1290.0 6479 1720.0
master GRU CUDA 32 32 16 forward 1474.0 5713 328.670013427734
master GRU CUDA 32 32 16 backward 4901.0 13607 844.5
master GRU CUDA 32 32 16 forw and back 7431.0 22671 1390.0
master GRU CPU 32 32 32 forward 337.221008300781 385 534.419982910156
master GRU CPU 32 32 32 backward 1263.0 6221 2020.0
master GRU CPU 32 32 32 forw and back 2519.0 12799 3440.0
master GRU CUDA 32 32 32 forward 2788.0 11425 657.299987792969
master GRU CUDA 32 32 32 backward 9441.0 27127 1650.0
master GRU CUDA 32 32 32 forw and back 14165.0 45151 2770.0
master GRU CPU 32 32 64 forward 682.10302734375 769 1050.0
master GRU CPU 32 32 64 backward 2534.0 12365 4060.0
master GRU CPU 32 32 64 forw and back 5012.0 25439 6900.0
master GRU CUDA 32 32 64 forward 5414.0 22849 1280.0
master GRU CUDA 32 32 64 backward 18435.0 54167 3290.0
master GRU CUDA 32 32 64 forw and back 27756.0 90111 5540.0
master GRU CPU 32 128 4 forward 145.934005737305 49 238.020004272461
master GRU CPU 32 128 4 backward 290.269012451172 844 747.299987792969
master GRU CPU 32 128 4 forw and back 601.927978515625 1737 1150.0
master GRU CUDA 32 128 4 forward 469.497985839844 1437 82.3300018310547
master GRU CUDA 32 128 4 backward 1481.0 3502 214.089996337891
master GRU CUDA 32 128 4 forw and back 2228.0 5845 364.75
master GRU CPU 32 128 16 forward 617.14697265625 193 1000.0
master GRU CPU 32 128 16 backward 1132.0 3149 3090.0
master GRU CPU 32 128 16 forw and back 2237.0 6479 4830.0
master GRU CUDA 32 128 16 forward 1491.0 5745 329.170013427734
master GRU CUDA 32 128 16 backward 4994.0 13751 846.75
master GRU CUDA 32 128 16 forw and back 7488.0 22815 1390.0
master GRU CPU 32 128 32 forward 1244.0 385 2020.0
master GRU CPU 32 128 32 backward 2306.0 6221 6230.0
master GRU CPU 32 128 32 forw and back 4461.0 12799 9730.0
master GRU CUDA 32 128 32 forward 2786.0 11489 658.299987792969
master GRU CUDA 32 128 32 backward 9673.0 27415 1650.0
master GRU CUDA 32 128 32 forw and back 14565.0 45439 2780.0
master GRU CPU 32 128 64 forward 2508.0 769 4070.00024414062
master GRU CPU 32 128 64 backward 4782.0 12365 12510.0
master GRU CPU 32 128 64 forw and back 8927.0 25439 19540.0
master GRU CUDA 32 128 64 forward 5382.0 22977 1290.0
master GRU CUDA 32 128 64 backward 18704.0 54743 3300.0
master GRU CUDA 32 128 64 forw and back 28415.0 90687 5550.0
master GRU CPU 32 512 4 forward 1414.0 56 933.469970703125
master GRU CPU 32 512 4 backward 1809.0 880 2670.0
master GRU CPU 32 512 4 forw and back 3539.0 1788 4060.0
master GRU CUDA 32 512 4 forward 471.503997802734 1458 82.6600036621094
master GRU CUDA 32 512 4 backward 1545.0 3594 217.839996337891
master GRU CUDA 32 512 4 forw and back 2299.0 5998 370.220001220703
master GRU CPU 32 512 16 forward 5816.0 224 3930.0
master GRU CPU 32 512 16 backward 7534.0 3305 11350.0
master GRU CPU 32 512 16 forw and back 14328.0 6698 17150.0
master GRU CUDA 32 512 16 forward 1482.0 5838 330.619995117187
master GRU CUDA 32 512 16 backward 5323.0 14131 861.940002441406
master GRU CUDA 32 512 16 forw and back 8156.0 23400 1410.0
master GRU CPU 32 512 32 forward 11724.0 448 7950.0
master GRU CPU 32 512 32 backward 15170.0 6537 22910.0
master GRU CPU 32 512 32 forw and back 28593.0 13242 34600.0
master GRU CUDA 32 512 32 forward 2800.0 11678 661.25
master GRU CUDA 32 512 32 backward 10281.0 28179 1680.0
master GRU CUDA 32 512 32 forw and back 15715.0 46600 2820.0
master GRU CPU 32 512 64 forward 23528.0 896 15990.0
master GRU CPU 32 512 64 backward 30385.0 13001 46050.0
master GRU CPU 32 512 64 forw and back 57075.0 26330 69500.0
master GRU CUDA 32 512 64 forward 5475.0 23358 1290.0
master GRU CUDA 32 512 64 backward 20310.0 56275 3360.0
master GRU CUDA 32 512 64 forw and back 30778.0 93000 5620.0
master GRU CPU 128 32 4 forward 49.060001373291 49 61.7000007629395
master GRU CPU 128 32 4 backward 203.59700012207 844 405.170013427734
master GRU CPU 128 32 4 forw and back 408.595001220703 1737 539.22998046875
master GRU CUDA 128 32 4 forward 499.382995605469 1429 82.1999969482422
master GRU CUDA 128 32 4 backward 1507.0 3466 213.529998779297
master GRU CUDA 128 32 4 forw and back 2247.0 5809 364.190002441406
master GRU CPU 128 32 16 forward 211.781997680664 193 264.299987792969
master GRU CPU 128 32 16 backward 763.973022460938 3149 1650.0
master GRU CPU 128 32 16 forw and back 1447.0 6479 2180.0
master GRU CUDA 128 32 16 forward 1528.0 5713 328.670013427734
master GRU CUDA 128 32 16 backward 5001.0 13607 844.5
master GRU CUDA 128 32 16 forw and back 7507.0 22671 1390.0
master GRU CPU 128 32 32 forward 430.852996826172 385 534.419982910156
master GRU CPU 128 32 32 backward 1544.0 6221 3330.0
master GRU CPU 128 32 32 forw and back 2857.0 12799 4370.0
master GRU CUDA 128 32 32 forward 2827.0 11425 657.299987792969
master GRU CUDA 128 32 32 backward 9425.0 27127 1650.0
master GRU CUDA 128 32 32 forw and back 14271.0 45151 2770.0
master GRU CPU 128 32 64 forward 868.632995605469 769 1050.0
master GRU CPU 128 32 64 backward 3253.0 12365 6680.0
master GRU CPU 128 32 64 forw and back 5797.0 25439 8770.0
master GRU CUDA 128 32 64 forward 5454.0 22849 1280.0
master GRU CUDA 128 32 64 backward 18304.0 54167 3290.0
master GRU CUDA 128 32 64 forw and back 27904.0 90111 5540.0
master GRU CPU 128 128 4 forward 504.920013427734 49 238.020004272461
master GRU CPU 128 128 4 backward 1054.0 852 1170.0
master GRU CPU 128 128 4 forw and back 1904.0 1741 1400.0
master GRU CUDA 128 128 4 forward 476.441986083984 1437 82.3300018310547
master GRU CUDA 128 128 4 backward 1482.0 3502 214.089996337891
master GRU CUDA 128 128 4 forw and back 2258.0 5845 364.75
master GRU CPU 128 128 16 forward 2126.0 193 1000.0
master GRU CPU 128 128 16 backward 4301.0 3181 4860.0
master GRU CPU 128 128 16 forw and back 4592.0 6495 5850.0
master GRU CUDA 128 128 16 forward 1507.0 5745 329.170013427734
master GRU CUDA 128 128 16 backward 5061.0 13751 846.75
master GRU CUDA 128 128 16 forw and back 7577.0 22815 1390.0
master GRU CPU 128 128 32 forward 4291.0 385 2020.0
master GRU CPU 128 128 32 backward 8731.0 6285 9780.0
master GRU CPU 128 128 32 forw and back 14893.0 12831 11780.0
master GRU CUDA 128 128 32 forward 2798.0 11489 658.299987792969
master GRU CUDA 128 128 32 backward 9674.0 27415 1650.0
master GRU CUDA 128 128 32 forw and back 14475.0 45439 2780.0
master GRU CPU 128 128 64 forward 8624.0 769 4070.00024414062
master GRU CPU 128 128 64 backward 17444.0 12493 19620.0
master GRU CPU 128 128 64 forw and back 29794.0 25503 23650.0
master GRU CUDA 128 128 64 forward 5462.0 22977 1290.0
master GRU CUDA 128 128 64 backward 18837.0 54743 3300.0
master GRU CUDA 128 128 64 forw and back 28386.0 90687 5550.0
master GRU CPU 128 512 4 forward 1494.0 56 933.469970703125
master GRU CPU 128 512 4 backward 2102.0 880 4230.0
master GRU CPU 128 512 4 forw and back 3756.0 1788 4870.0
master GRU CUDA 128 512 4 forward 475.424987792969 1458 82.6600036621094
master GRU CUDA 128 512 4 backward 1550.0 3594 217.839996337891
master GRU CUDA 128 512 4 forw and back 2331.0 5998 370.220001220703
master GRU CPU 128 512 16 forward 6206.0 224 3930.0
master GRU CPU 128 512 16 backward 8760.0 3305 17620.0
master GRU CPU 128 512 16 forw and back 15116.0 6698 20420.0
master GRU CUDA 128 512 16 forward 1505.0 5838 330.619995117187
master GRU CUDA 128 512 16 backward 5327.0 14131 861.940002441406
master GRU CUDA 128 512 16 forw and back 7866.0 23400 1410.0
master GRU CPU 128 512 32 forward 12577.0 448 7950.0
master GRU CPU 128 512 32 backward 17580.0 6537 35470.0
master GRU CPU 128 512 32 forw and back 30524.0 13242 41150.0
master GRU CUDA 128 512 32 forward 2848.0 11678 661.25
master GRU CUDA 128 512 32 backward 10367.0 28179 1680.0
master GRU CUDA 128 512 32 forw and back 15281.0 46600 2820.0
master GRU CPU 128 512 64 forward 25270.0 896 15990.0
master GRU CPU 128 512 64 backward 35290.0 13001 71160.0
master GRU CPU 128 512 64 forw and back 60744.0 26330 82620.0
master GRU CUDA 128 512 64 forward 5532.0 23358 1290.0
master GRU CUDA 128 512 64 backward 20284.0 56275 3360.0
master GRU CUDA 128 512 64 forw and back 30075.0 93000 5620.0
master GRU CPU 512 32 4 forward 319.101989746094 49 61.7000007629395
master GRU CPU 512 32 4 backward 923.757995605469 859 1020.0
master GRU CPU 512 32 4 forw and back 1572.0 1748 982.380004882813
master GRU CUDA 512 32 4 forward 507.290008544922 1449 82.5199966430664
master GRU CUDA 512 32 4 backward 1492.0 3498 214.029998779297
master GRU CUDA 512 32 4 forw and back 2245.0 5861 365.0
master GRU CPU 512 32 16 forward 1164.0 193 264.299987792969
master GRU CPU 512 32 16 backward 3791.0 3212 4240.0
master GRU CPU 512 32 16 forw and back 6120.0 6526 4010.00024414062
master GRU CUDA 512 32 16 forward 1564.0 5793 329.920013427734
master GRU CUDA 512 32 16 backward 5027.0 13735 846.5
master GRU CUDA 512 32 16 forw and back 7642.0 22879 1400.0
master GRU CPU 512 32 32 forward 2637.0 385 534.419982910156
master GRU CPU 512 32 32 backward 7665.0 6348 8530.0
master GRU CPU 512 32 32 forw and back 12242.0 12894 8080.0
master GRU CUDA 512 32 32 forward 2881.0 11585 659.799987792969
master GRU CUDA 512 32 32 backward 9533.0 27383 1650.0
master GRU CUDA 512 32 32 forw and back 14572.0 45567 2780.0
master GRU CPU 512 32 64 forward 5305.0 769 1050.0
master GRU CPU 512 32 64 backward 15232.0 12620 17120.0
master GRU CPU 512 32 64 forw and back 24258.0 25630 16219.9990234375
master GRU CUDA 512 32 64 forward 5537.0 23169 1290.0
master GRU CUDA 512 32 64 backward 18534.0 54679 3300.0
master GRU CUDA 512 32 64 forw and back 28153.0 90943 5550.0
master GRU CPU 512 128 4 forward 593.072998046875 49 238.020004272461
master GRU CPU 512 128 4 backward 1418.0 859 2910.0
master GRU CPU 512 128 4 forw and back 2217.0 1748 2400.0
master GRU CUDA 512 128 4 forward 493.325988769531 1457 82.6399993896484
master GRU CUDA 512 128 4 backward 1512.0 3534 214.589996337891
master GRU CUDA 512 128 4 forw and back 2295.0 5897 365.559997558594
master GRU CPU 512 128 16 forward 2444.0 193 1000.0
master GRU CPU 512 128 16 backward 5984.0 3212 11940.0
master GRU CPU 512 128 16 forw and back 8924.0 6526 9940.0
master GRU CUDA 512 128 16 forward 1503.0 5825 330.420013427734
master GRU CUDA 512 128 16 backward 5069.0 13879 848.75
master GRU CUDA 512 128 16 forw and back 7608.0 23023 1400.0
master GRU CPU 512 128 32 forward 4212.0 385 2020.0
master GRU CPU 512 128 32 backward 11972.0 6348 23990.0
master GRU CPU 512 128 32 forw and back 17700.0 12894 19990.0
master GRU CUDA 512 128 32 forward 2811.0 11649 660.799987792969
master GRU CUDA 512 128 32 backward 9804.0 27671 1650.0
master GRU CUDA 512 128 32 forw and back 14645.0 45855 2790.0
master GRU CPU 512 128 64 forward 10127.0 769 4070.00024414062
master GRU CPU 512 128 64 backward 23969.0 12620 48070.0
master GRU CPU 512 128 64 forw and back 35305.0 25630 40100.0
master GRU CUDA 512 128 64 forward 5483.0 23297 1290.0
master GRU CUDA 512 128 64 backward 18967.0 55255 3310.0
master GRU CUDA 512 128 64 forw and back 28818.0 91519 5560.0
master GRU CPU 512 512 4 forward 1767.0 56 933.469970703125
master GRU CPU 512 512 4 backward 3335.0 887 10480.0
master GRU CPU 512 512 4 forw and back 5143.0 1795 8109.99951171875
master GRU CUDA 512 512 4 forward 533.997009277344 1478 82.9700012207031
master GRU CUDA 512 512 4 backward 1617.0 3626 218.339996337891
master GRU CUDA 512 512 4 forw and back 2420.0 6050 371.029998779297
master GRU CPU 512 512 16 forward 7644.0 224 3930.0
master GRU CPU 512 512 16 backward 15006.0 3336 42710.0
master GRU CPU 512 512 16 forw and back 20369.0 6729 33500.0
master GRU CUDA 512 512 16 forward 1656.0 5918 331.880004882812
master GRU CUDA 512 512 16 backward 5598.0 14259 863.940002441406
master GRU CUDA 512 512 16 forw and back 8168.0 23608 1420.0
master GRU CPU 512 512 32 forward 15379.0 448 7950.0
master GRU CPU 512 512 32 backward 30088.0 6600 85680.0
master GRU CPU 512 512 32 forw and back 40615.0 13305 67360.0
master GRU CUDA 512 512 32 forward 3145.0 11838 663.75
master GRU CUDA 512 512 32 backward 10914.0 28435 1680.0
master GRU CUDA 512 512 32 forw and back 15935.0 47016 2820.0
master GRU CPU 512 512 64 forward 30859.0 896 15990.0
master GRU CPU 512 512 64 backward 60045.0 13128 171620.0
master GRU CPU 512 512 64 forw and back 81051.0 26457 135070.0
master GRU CUDA 512 512 64 forward 6096.0 23678 1300.0
master GRU CUDA 512 512 64 backward 22248.0 56787 3370.0
master GRU CUDA 512 512 64 forw and back 31378.0 93832 5640.0
pr1761_multigate LSTM CPU 32 32 4 forward 42.0589981079102 21 54.1399993896484
pr1761_multigate LSTM CPU 32 32 4 backward 151.651000976562 824 176.080001831055
pr1761_multigate LSTM CPU 32 32 4 forw and back 321.837005615234 1779 347.859985351562
pr1761_multigate LSTM CUDA 32 32 4 forward 248.850006103516 557 49.3300018310547
pr1761_multigate LSTM CUDA 32 32 4 backward 1016.99993896484 3199 212.309997558594
pr1761_multigate LSTM CUDA 32 32 4 forw and back 1550.0 4733 328.299987792969
pr1761_multigate LSTM CPU 32 32 16 forward 176.811996459961 81 228.229995727539
pr1761_multigate LSTM CPU 32 32 16 backward 568.111999511719 3164 720.22998046875
pr1761_multigate LSTM CPU 32 32 16 forw and back 1157.0 6711 1370.0
pr1761_multigate LSTM CUDA 32 32 16 forward 687.651977539062 2225 197.169998168945
pr1761_multigate LSTM CUDA 32 32 16 backward 3388.0 12715 847.169982910156
pr1761_multigate LSTM CUDA 32 32 16 forw and back 5056.0 18509 1260.0
pr1761_multigate LSTM CPU 32 32 32 forward 360.375 161 460.359985351563
pr1761_multigate LSTM CPU 32 32 32 backward 1118.0 6285 1410.0
pr1761_multigate LSTM CPU 32 32 32 forw and back 2261.0 13289 2740.0
pr1761_multigate LSTM CUDA 32 32 32 forward 1288.0 4449 394.299987792969
pr1761_multigate LSTM CUDA 32 32 32 backward 6496.0 25404 1650.0
pr1761_multigate LSTM CUDA 32 32 32 forw and back 9757.0 36879 2510.0
pr1761_multigate LSTM CPU 32 32 64 forward 724.122009277344 321 924.619995117188
pr1761_multigate LSTM CPU 32 32 64 backward 2240.0 12525 2830.0
pr1761_multigate LSTM CPU 32 32 64 forw and back 4488.0 26441 5480.0
pr1761_multigate LSTM CUDA 32 32 64 forward 2456.0 8897 788.559997558594
pr1761_multigate LSTM CUDA 32 32 64 backward 12751.0 50780 3310.0
pr1761_multigate LSTM CUDA 32 32 64 forw and back 19279.0 73615 5000.0
pr1761_multigate LSTM CPU 32 128 4 forward 154.302993774414 21 210.639999389648
pr1761_multigate LSTM CPU 32 128 4 backward 247.195999145508 824 473.200012207031
pr1761_multigate LSTM CPU 32 128 4 forw and back 543.414978027344 1779 887.22998046875
pr1761_multigate LSTM CUDA 32 128 4 forward 254.785003662109 561 49.3899993896484
pr1761_multigate LSTM CUDA 32 128 4 backward 1052.0 3243 213.0
pr1761_multigate LSTM CUDA 32 128 4 forw and back 1597.0 4777 328.980010986328
pr1761_multigate LSTM CPU 32 128 16 forward 635.35302734375 81 890.22998046875
pr1761_multigate LSTM CPU 32 128 16 backward 957.734008789063 3164 1880.0
pr1761_multigate LSTM CPU 32 128 16 forw and back 2027.0 6711 3530.0
pr1761_multigate LSTM CUDA 32 128 16 forward 714.284973144531 2241 197.419998168945
pr1761_multigate LSTM CUDA 32 128 16 backward 3451.0 12891 849.919982910156
pr1761_multigate LSTM CUDA 32 128 16 forw and back 5187.0 18685 1260.0
pr1761_multigate LSTM CPU 32 128 32 forward 1288.0 161 1750.0
pr1761_multigate LSTM CPU 32 128 32 backward 1938.0 6285 3770.0
pr1761_multigate LSTM CPU 32 128 32 forw and back 4065.0 13289 7070.0
pr1761_multigate LSTM CUDA 32 128 32 forward 1306.0 4481 394.799987792969
pr1761_multigate LSTM CUDA 32 128 32 backward 6603.0 25756 1660.0
pr1761_multigate LSTM CUDA 32 128 32 forw and back 9957.0 37231 2510.0
pr1761_multigate LSTM CPU 32 128 64 forward 2591.0 321 3520.0
pr1761_multigate LSTM CPU 32 128 64 backward 4008.0 12525 7560.0
pr1761_multigate LSTM CPU 32 128 64 forw and back 8139.0 26441 14170.0
pr1761_multigate LSTM CUDA 32 128 64 forward 2502.0 8961 789.559997558594
pr1761_multigate LSTM CUDA 32 128 64 backward 12875.0 51484 3320.0
pr1761_multigate LSTM CUDA 32 128 64 forw and back 19406.0 74319 5020.0
pr1761_multigate LSTM CPU 32 512 4 forward 1443.0 32 833.780029296875
pr1761_multigate LSTM CPU 32 512 4 backward 1446.0 836 1610.0
pr1761_multigate LSTM CPU 32 512 4 forw and back 3283.0 1814 2950.0
pr1761_multigate LSTM CUDA 32 512 4 forward 258.283996582031 582 49.7200012207031
pr1761_multigate LSTM CUDA 32 512 4 backward 1099.0 3271 213.440002441406
pr1761_multigate LSTM CUDA 32 512 4 forw and back 1651.0 4858 330.890014648437
pr1761_multigate LSTM CPU 32 512 16 forward 6000.0 128 3440.0
pr1761_multigate LSTM CPU 32 512 16 backward 6059.0 3212 6540.0
pr1761_multigate LSTM CPU 32 512 16 forw and back 13348.0 6854 12080.0
pr1761_multigate LSTM CUDA 32 512 16 forward 718.882019042969 2334 198.880004882812
pr1761_multigate LSTM CUDA 32 512 16 backward 3616.0 13015 851.859985351563
pr1761_multigate LSTM CUDA 32 512 16 forw and back 5308.0 18982 1270.0
pr1761_multigate LSTM CPU 32 512 32 forward 12162.0 256 6950.0
pr1761_multigate LSTM CPU 32 512 32 backward 12209.0 6381 13120.0
pr1761_multigate LSTM CPU 32 512 32 forw and back 26654.0 13576 24240.0
pr1761_multigate LSTM CUDA 32 512 32 forward 1328.0 4670 397.75
pr1761_multigate LSTM CUDA 32 512 32 backward 6839.0 26008 1660.0
pr1761_multigate LSTM CUDA 32 512 32 forw and back 10249.0 37816 2520.0
pr1761_multigate LSTM CPU 32 512 64 forward 24408.0 512 13960.0
pr1761_multigate LSTM CPU 32 512 64 backward 24593.0 12717 26270.0
pr1761_multigate LSTM CPU 32 512 64 forw and back 53268.0 27016 48570.0
pr1761_multigate LSTM CUDA 32 512 64 forward 2553.0 9342 795.52001953125
pr1761_multigate LSTM CUDA 32 512 64 backward 13356.0 51992 3330.0
pr1761_multigate LSTM CUDA 32 512 64 forw and back 20183.0 75480 5030.0
pr1761_multigate LSTM CPU 128 32 4 forward 55.5870018005371 21 54.1399993896484
pr1761_multigate LSTM CPU 128 32 4 backward 191.156005859375 824 356.079986572266
pr1761_multigate LSTM CPU 128 32 4 forw and back 383.257995605469 1779 479.859985351562
pr1761_multigate LSTM CUDA 128 32 4 forward 267.013000488281 557 49.3300018310547
pr1761_multigate LSTM CUDA 128 32 4 backward 1084.0 3199 212.309997558594
pr1761_multigate LSTM CUDA 128 32 4 forw and back 1636.0 4733 328.299987792969
pr1761_multigate LSTM CPU 128 32 16 forward 234.233001708984 81 228.229995727539
pr1761_multigate LSTM CPU 128 32 16 backward 715.557983398438 3164 1440.0
pr1761_multigate LSTM CPU 128 32 16 forw and back 1358.0 6711 1920.0
pr1761_multigate LSTM CUDA 128 32 16 forward 720.684020996094 2225 197.169998168945
pr1761_multigate LSTM CUDA 128 32 16 backward 3442.0 12715 847.169982910156
pr1761_multigate LSTM CUDA 128 32 16 forw and back 5143.0 18509 1260.0
pr1761_multigate LSTM CPU 128 32 32 forward 474.350006103516 161 460.359985351563
pr1761_multigate LSTM CPU 128 32 32 backward 1439.0 6285 2900.0
pr1761_multigate LSTM CPU 128 32 32 forw and back 2681.0 13289 3850.0
pr1761_multigate LSTM CUDA 128 32 32 forward 1330.0 4449 394.299987792969
pr1761_multigate LSTM CUDA 128 32 32 backward 6586.0 25404 1650.0
pr1761_multigate LSTM CUDA 128 32 32 forw and back 9891.0 36879 2510.0
pr1761_multigate LSTM CPU 128 32 64 forward 952.992980957031 321 924.619995117188
pr1761_multigate LSTM CPU 128 32 64 backward 3052.0 12525 5820.0
pr1761_multigate LSTM CPU 128 32 64 forw and back 5437.0 26441 7710.0
pr1761_multigate LSTM CUDA 128 32 64 forward 2529.0 8897 788.559997558594
pr1761_multigate LSTM CUDA 128 32 64 backward 12845.0 50780 3310.0
pr1761_multigate LSTM CUDA 128 32 64 forw and back 19381.0 73615 5000.0
pr1761_multigate LSTM CPU 128 128 4 forward 507.778015136719 21 210.639999389648
pr1761_multigate LSTM CPU 128 128 4 backward 963.617004394531 832 940.580017089844
pr1761_multigate LSTM CPU 128 128 4 forw and back 1800.0 1783 1140.0
pr1761_multigate LSTM CUDA 128 128 4 forward 262.748992919922 561 49.3899993896484
pr1761_multigate LSTM CUDA 128 128 4 backward 1070.0 3243 213.0
pr1761_multigate LSTM CUDA 128 128 4 forw and back 1622.0 4777 328.980010986328
pr1761_multigate LSTM CPU 128 128 16 forward 2136.0 81 890.22998046875
pr1761_multigate LSTM CPU 128 128 16 backward 3893.0 3196 3740.0
pr1761_multigate LSTM CPU 128 128 16 forw and back 7014.0 6727 4640.0
pr1761_multigate LSTM CUDA 128 128 16 forward 737.192016601562 2241 197.419998168945
pr1761_multigate LSTM CUDA 128 128 16 backward 3495.0 12891 849.919982910156
pr1761_multigate LSTM CUDA 128 128 16 forw and back 5242.0 18685 1260.0
pr1761_multigate LSTM CPU 128 128 32 forward 4290.0 161 1750.0
pr1761_multigate LSTM CPU 128 128 32 backward 7861.0 6349 7510.0
pr1761_multigate LSTM CPU 128 128 32 forw and back 14093.0 13321 9310.0
pr1761_multigate LSTM CUDA 128 128 32 forward 1329.0 4481 394.799987792969
pr1761_multigate LSTM CUDA 128 128 32 backward 6652.0 25756 1660.0
pr1761_multigate LSTM CUDA 128 128 32 forw and back 10033.0 37231 2510.0
pr1761_multigate LSTM CPU 128 128 64 forward 8724.0 321 3520.0
pr1761_multigate LSTM CPU 128 128 64 backward 15784.0 12653 15040.0
pr1761_multigate LSTM CPU 128 128 64 forw and back 28257.0 26505 18650.0
pr1761_multigate LSTM CUDA 128 128 64 forward 2550.0 8961 789.559997558594
pr1761_multigate LSTM CUDA 128 128 64 backward 12912.0 51484 3320.0
pr1761_multigate LSTM CUDA 128 128 64 forw and back 19573.0 74319 5020.0
pr1761_multigate LSTM CPU 128 512 4 forward 1513.0 32 833.780029296875
pr1761_multigate LSTM CPU 128 512 4 backward 1726.0 836 3190.0
pr1761_multigate LSTM CPU 128 512 4 forw and back 3499.0 1814 3780.0
pr1761_multigate LSTM CUDA 128 512 4 forward 285.451995849609 582 49.7200012207031
pr1761_multigate LSTM CUDA 128 512 4 backward 1142.0 3271 213.440002441406
pr1761_multigate LSTM CUDA 128 512 4 forw and back 1721.0 4858 330.890014648437
pr1761_multigate LSTM CPU 128 512 16 forward 6449.0 128 3440.0
pr1761_multigate LSTM CPU 128 512 16 backward 7435.0 3212 12910.0
pr1761_multigate LSTM CPU 128 512 16 forw and back 14501.0 6854 15440.0
pr1761_multigate LSTM CUDA 128 512 16 forward 788.846008300781 2334 198.880004882812
pr1761_multigate LSTM CUDA 128 512 16 backward 3723.0 13015 851.859985351563
pr1761_multigate LSTM CUDA 128 512 16 forw and back 5460.0 18982 1270.0
pr1761_multigate LSTM CPU 128 512 32 forward 13083.0 256 6950.0
pr1761_multigate LSTM CPU 128 512 32 backward 14822.0 6381 25860.0
pr1761_multigate LSTM CPU 128 512 32 forw and back 28938.0 13576 30980.0
pr1761_multigate LSTM CUDA 128 512 32 forward 1454.0 4670 397.75
pr1761_multigate LSTM CUDA 128 512 32 backward 7158.0 26008 1660.0
pr1761_multigate LSTM CUDA 128 512 32 forw and back 10586.0 37816 2520.0
pr1761_multigate LSTM CPU 128 512 64 forward 26329.0 512 13960.0
pr1761_multigate LSTM CPU 128 512 64 backward 29689.0 12717 51760.0
pr1761_multigate LSTM CPU 128 512 64 forw and back 57559.0 27016 62060.0
pr1761_multigate LSTM CUDA 128 512 64 forward 2789.0 9342 795.52001953125
pr1761_multigate LSTM CUDA 128 512 64 backward 14019.0 51992 3330.0
pr1761_multigate LSTM CUDA 128 512 64 forw and back 20952.0 75480 5030.0
pr1761_multigate LSTM CPU 512 32 4 forward 281.427001953125 21 54.1399993896484
pr1761_multigate LSTM CPU 512 32 4 backward 886.869995117188 839 1050.0
pr1761_multigate LSTM CPU 512 32 4 forw and back 1483.0 1790 1007.0
pr1761_multigate LSTM CUDA 512 32 4 forward 274.860992431641 577 49.6399993896484
pr1761_multigate LSTM CUDA 512 32 4 backward 1097.0 3231 212.809997558594
pr1761_multigate LSTM CUDA 512 32 4 forw and back 1673.0 4785 329.109985351562
pr1761_multigate LSTM CPU 512 32 16 forward 1197.0 81 228.229995727539
pr1761_multigate LSTM CPU 512 32 16 backward 3595.0 3227 4390.0
pr1761_multigate LSTM CPU 512 32 16 forw and back 5754.0 6758 4120.0
pr1761_multigate LSTM CUDA 512 32 16 forward 741.932983398437 2305 198.419998168945
pr1761_multigate LSTM CUDA 512 32 16 backward 3532.0 12843 849.169982910156
pr1761_multigate LSTM CUDA 512 32 16 forw and back 5294.0 18717 1260.0
pr1761_multigate LSTM CPU 512 32 32 forward 2410.0 161 460.359985351563
pr1761_multigate LSTM CPU 512 32 32 backward 7267.0 6412 8840.0
pr1761_multigate LSTM CPU 512 32 32 forw and back 11516.0 13384 8300.0
pr1761_multigate LSTM CUDA 512 32 32 forward 1359.0 4609 396.799987792969
pr1761_multigate LSTM CUDA 512 32 32 backward 6715.0 25660 1660.0
pr1761_multigate LSTM CUDA 512 32 32 forw and back 10137.0 37295 2510.0
pr1761_multigate LSTM CPU 512 32 64 forward 3643.0 321 924.619995117188
pr1761_multigate LSTM CPU 512 32 64 backward 14499.0 12780 17750.0
pr1761_multigate LSTM CPU 512 32 64 forw and back 22928.0 26632 16650.0
pr1761_multigate LSTM CUDA 512 32 64 forward 2553.0 9217 793.559997558594
pr1761_multigate LSTM CUDA 512 32 64 backward 12995.0 51292 3320.0
pr1761_multigate LSTM CUDA 512 32 64 forw and back 19851.0 74447 5020.0
pr1761_multigate LSTM CPU 512 128 4 forward 575.971984863281 21 210.639999389648
pr1761_multigate LSTM CPU 512 128 4 backward 1308.0 839 2750.0
pr1761_multigate LSTM CPU 512 128 4 forw and back 2122.0 1790 2210.0
pr1761_multigate LSTM CUDA 512 128 4 forward 311.31201171875 581 49.7000007629395
pr1761_multigate LSTM CUDA 512 128 4 backward 1118.0 3275 213.5
pr1761_multigate LSTM CUDA 512 128 4 forw and back 1723.0 4829 329.799987792969
pr1761_multigate LSTM CPU 512 128 16 forward 2427.0 81 890.22998046875
pr1761_multigate LSTM CPU 512 128 16 backward 5588.0 3227 11190.0
pr1761_multigate LSTM CPU 512 128 16 forw and back 8536.0 6758 9090.0
pr1761_multigate LSTM CUDA 512 128 16 forward 877.736022949219 2321 198.669998168945
pr1761_multigate LSTM CUDA 512 128 16 backward 3624.0 13019 851.919982910156
pr1761_multigate LSTM CUDA 512 128 16 forw and back 5539.0 18893 1260.0
pr1761_multigate LSTM CPU 512 128 32 forward 4971.0 161 1750.0
pr1761_multigate LSTM CPU 512 128 32 backward 11162.0 6412 22460.0
pr1761_multigate LSTM CPU 512 128 32 forw and back 17002.0 13384 18260.0
pr1761_multigate LSTM CUDA 512 128 32 forward 1584.0 4641 397.299987792969
pr1761_multigate LSTM CUDA 512 128 32 backward 6910.0 26012 1660.0
pr1761_multigate LSTM CUDA 512 128 32 forw and back 10632.0 37647 2520.0
pr1761_multigate LSTM CPU 512 128 64 forward 10123.0 321 3520.0
pr1761_multigate LSTM CPU 512 128 64 backward 22419.0 12780 44980.0
pr1761_multigate LSTM CPU 512 128 64 forw and back 33834.0 26632 36600.0
pr1761_multigate LSTM CUDA 512 128 64 forward 3038.0 9281 794.559997558594
pr1761_multigate LSTM CUDA 512 128 64 backward 13390.0 51996 3330.0
pr1761_multigate LSTM CUDA 512 128 64 forw and back 20672.0 75151 5030.0
pr1761_multigate LSTM CPU 512 512 4 forward 1791.0 32 833.780029296875
pr1761_multigate LSTM CPU 512 512 4 backward 3316.0 843 9520.0
pr1761_multigate LSTM CPU 512 512 4 forw and back 3526.0 1821 7110.0
pr1761_multigate LSTM CUDA 512 512 4 forward 312.209014892578 602 50.0299987792969
pr1761_multigate LSTM CUDA 512 512 4 backward 1201.0 3303 213.940002441406
pr1761_multigate LSTM CUDA 512 512 4 forw and back 1778.0 4910 331.700012207031
pr1761_multigate LSTM CPU 512 512 16 forward 7796.0 128 3440.0
pr1761_multigate LSTM CPU 512 512 16 backward 13284.0 3243 38360.0
pr1761_multigate LSTM CPU 512 512 16 forw and back 19260.0 6885 28890.0
pr1761_multigate LSTM CUDA 512 512 16 forward 869.986999511719 2414 200.119995117187
pr1761_multigate LSTM CUDA 512 512 16 backward 3947.0 13143 853.859985351563
pr1761_multigate LSTM CUDA 512 512 16 forw and back 5743.0 19190 1270.0
pr1761_multigate LSTM CPU 512 512 32 forward 15670.0 256 6950.0
pr1761_multigate LSTM CPU 512 512 32 backward 26938.0 6444 76810.0
pr1761_multigate LSTM CPU 512 512 32 forw and back 38295.0 13639 57930.0
pr1761_multigate LSTM CUDA 512 512 32 forward 1604.0 4830 400.25
pr1761_multigate LSTM CUDA 512 512 32 backward 7653.0 26264 1670.0
pr1761_multigate LSTM CUDA 512 512 32 forw and back 11028.0 38232 2530.0
pr1761_multigate LSTM CPU 512 512 64 forward 31388.0 512 13960.0
pr1761_multigate LSTM CPU 512 512 64 backward 54138.0 12844 153700.0
pr1761_multigate LSTM CPU 512 512 64 forw and back 76616.0 27143 116000.0
pr1761_multigate LSTM CUDA 512 512 64 forward 3067.0 9662 800.52001953125
pr1761_multigate LSTM CUDA 512 512 64 backward 15159.0 52504 3330.0
pr1761_multigate LSTM CUDA 512 512 64 forw and back 21667.0 76312 5050.0
pr1761_multigate GRU CPU 32 32 4 forward 33.560001373291 25 39.1100006103516
pr1761_multigate GRU CPU 32 32 4 backward 167.455001831055 1117 194.770004272461
pr1761_multigate GRU CPU 32 32 4 forw and back 346.714996337891 1894 369.230010986328
pr1761_multigate GRU CUDA 32 32 4 forward 278.77099609375 597 46.7700004577637
pr1761_multigate GRU CUDA 32 32 4 backward 1348.0 3455 214.830001831055
pr1761_multigate GRU CUDA 32 32 4 forw and back 1918.0 4782 346.049987792969
pr1761_multigate GRU CPU 32 32 16 forward 149.580001831055 97 165.199996948242
pr1761_multigate GRU CPU 32 32 16 backward 659.940979003906 4238 796.590026855469
pr1761_multigate GRU CPU 32 32 16 forw and back 1254.0 7104 1450.0
pr1761_multigate GRU CUDA 32 32 16 forward 762.4990234375 2385 186.919998168945
pr1761_multigate GRU CUDA 32 32 16 backward 4445.0 13560 849.549987792969
pr1761_multigate GRU CUDA 32 32 16 forw and back 6255.0 18560 1320.0
pr1761_multigate GRU CPU 32 32 32 forward 306.828002929687 193 333.329986572266
pr1761_multigate GRU CPU 32 32 32 backward 1292.0 8398 1560.0
pr1761_multigate GRU CPU 32 32 32 forw and back 2424.0 14048 2900.0
pr1761_multigate GRU CUDA 32 32 32 forward 1399.0 4769 373.799987792969
pr1761_multigate GRU CUDA 32 32 32 backward 8518.0 27032 1660.0
pr1761_multigate GRU CUDA 32 32 32 forw and back 11977.0 36928 2630.0
pr1761_multigate GRU CPU 32 32 64 forward 619.963989257812 385 669.590026855469
pr1761_multigate GRU CPU 32 32 64 backward 2591.0 16718 3130.0
pr1761_multigate GRU CPU 32 32 64 forw and back 4806.0 27936 5810.0
pr1761_multigate GRU CUDA 32 32 64 forward 2662.0 9537 747.559997558594
pr1761_multigate GRU CUDA 32 32 64 backward 16727.0 53976 3310.0
pr1761_multigate GRU CUDA 32 32 64 forw and back 23488.0 73664 5250.0
pr1761_multigate GRU CPU 32 128 4 forward 130.85400390625 25 151.110000610352
pr1761_multigate GRU CPU 32 128 4 backward 274.640014648437 1117 506.890014648437
pr1761_multigate GRU CPU 32 128 4 forw and back 553.372985839844 1894 831.109985351563
pr1761_multigate GRU CUDA 32 128 4 forward 271.454986572266 609 46.9500007629395
pr1761_multigate GRU CUDA 32 128 4 backward 1353.0 3491 215.389999389648
pr1761_multigate GRU CUDA 32 128 4 forw and back 1936.0 4818 346.609985351562
pr1761_multigate GRU CPU 32 128 16 forward 546.9990234375 97 640.200012207031
pr1761_multigate GRU CPU 32 128 16 backward 1046.0 4238 2040.0
pr1761_multigate GRU CPU 32 128 16 forw and back 2017.0 7104 3330.0
pr1761_multigate GRU CUDA 32 128 16 forward 767.724975585938 2433 187.669998168945
pr1761_multigate GRU CUDA 32 128 16 backward 4547.0 13704 851.799987792969
pr1761_multigate GRU CUDA 32 128 16 forw and back 6358.0 18704 1320.0
pr1761_multigate GRU CPU 32 128 32 forward 1106.0 193 1260.0
pr1761_multigate GRU CPU 32 128 32 backward 2110.0 8398 4100.0
pr1761_multigate GRU CPU 32 128 32 forw and back 3990.0 14048 6680.0
pr1761_multigate GRU CUDA 32 128 32 forward 1409.0 4865 375.299987792969
pr1761_multigate GRU CUDA 32 128 32 backward 8790.0 27320 1660.0
pr1761_multigate GRU CUDA 32 128 32 forw and back 12198.0 37216 2640.0
pr1761_multigate GRU CPU 32 128 64 forward 2227.0 385 2540.0
pr1761_multigate GRU CPU 32 128 64 backward 4352.0 16718 8230.0
pr1761_multigate GRU CPU 32 128 64 forw and back 7996.0 27936 13380.0
pr1761_multigate GRU CUDA 32 128 64 forward 2695.0 9729 750.559997558594
pr1761_multigate GRU CUDA 32 128 64 backward 17108.0 54552 3320.0
pr1761_multigate GRU CUDA 32 128 64 forw and back 23983.0 74240 5260.0
pr1761_multigate GRU CPU 32 512 4 forward 1264.0 32 594.559997558594
pr1761_multigate GRU CPU 32 512 4 backward 1575.0 1132 1700.0
pr1761_multigate GRU CPU 32 512 4 forw and back 2672.0 1924 2590.0
pr1761_multigate GRU CUDA 32 512 4 forward 269.457000732422 630 47.2799987792969
pr1761_multigate GRU CUDA 32 512 4 backward 1416.0 3583 219.139999389648
pr1761_multigate GRU CUDA 32 512 4 forw and back 1990.0 4971 352.079986572266
pr1761_multigate GRU CPU 32 512 16 forward 5298.0 128 2460.0
pr1761_multigate GRU CPU 32 512 16 backward 6639.0 4301 7040.0
pr1761_multigate GRU CPU 32 512 16 forw and back 12694.0 7230 10730.0
pr1761_multigate GRU CUDA 32 512 16 forward 764.281982421875 2526 189.119995117188
pr1761_multigate GRU CUDA 32 512 16 backward 4834.0 14084 866.97998046875
pr1761_multigate GRU CUDA 32 512 16 forw and back 6935.0 19289 1340.0
pr1761_multigate GRU CPU 32 512 32 forward 10722.0 256 4970.0
pr1761_multigate GRU CPU 32 512 32 backward 13356.0 8525 14160.0
pr1761_multigate GRU CPU 32 512 32 forw and back 22247.0 14302 21570.0
pr1761_multigate GRU CUDA 32 512 32 forward 1424.0 5054 378.25
pr1761_multigate GRU CUDA 32 512 32 backward 9320.0 28084 1690.0
pr1761_multigate GRU CUDA 32 512 32 forw and back 12783.0 38377 2670.0
pr1761_multigate GRU CPU 32 512 64 forward 21675.0 512 9990.0
pr1761_multigate GRU CPU 32 512 64 backward 26814.0 16973 28400.0
pr1761_multigate GRU CPU 32 512 64 forw and back 50952.0 28446 43270.0
pr1761_multigate GRU CUDA 32 512 64 forward 2725.0 10110 756.52001953125
pr1761_multigate GRU CUDA 32 512 64 backward 18307.0 56084 3380.0
pr1761_multigate GRU CUDA 32 512 64 forw and back 25988.0 76553 5340.0
pr1761_multigate GRU CPU 128 32 4 forward 46.640998840332 25 39.1100006103516
pr1761_multigate GRU CPU 128 32 4 backward 214.24299621582 1117 353.769989013672
pr1761_multigate GRU CPU 128 32 4 forw and back 411.119995117187 1894 480.230010986328
pr1761_multigate GRU CUDA 128 32 4 forward 281.294006347656 597 46.7700004577637
pr1761_multigate GRU CUDA 128 32 4 backward 1362.0 3455 214.830001831055
pr1761_multigate GRU CUDA 128 32 4 forw and back 1935.0 4782 346.049987792969
pr1761_multigate GRU CPU 128 32 16 forward 196.511993408203 97 165.199996948242
pr1761_multigate GRU CPU 128 32 16 backward 787.031005859375 4238 1430.0
pr1761_multigate GRU CPU 128 32 16 forw and back 1428.0 7104 1910.0
pr1761_multigate GRU CUDA 128 32 16 forward 787.830017089844 2385 186.919998168945
pr1761_multigate GRU CUDA 128 32 16 backward 4583.0 13560 849.549987792969
pr1761_multigate GRU CUDA 128 32 16 forw and back 6438.0 18560 1320.0
pr1761_multigate GRU CPU 128 32 32 forward 398.618011474609 193 333.329986572266
pr1761_multigate GRU CPU 128 32 32 backward 1575.0 8398 2870.0
pr1761_multigate GRU CPU 128 32 32 forw and back 2794.0 14048 3830.0
pr1761_multigate GRU CUDA 128 32 32 forward 1449.0 4769 373.799987792969
pr1761_multigate GRU CUDA 128 32 32 backward 8759.0 27032 1660.0
pr1761_multigate GRU CUDA 128 32 32 forw and back 12227.0 36928 2630.0
pr1761_multigate GRU CPU 128 32 64 forward 798.705993652344 385 669.590026855469
pr1761_multigate GRU CPU 128 32 64 backward 3280.0 16718 5750.0
pr1761_multigate GRU CPU 128 32 64 forw and back 5621.0 27936 7680.0
pr1761_multigate GRU CUDA 128 32 64 forward 2768.0 9537 747.559997558594
pr1761_multigate GRU CUDA 128 32 64 backward 16799.0 53976 3310.0
pr1761_multigate GRU CUDA 128 32 64 forw and back 23627.0 73664 5250.0
pr1761_multigate GRU CPU 128 128 4 forward 470.582000732422 25 151.110000610352
pr1761_multigate GRU CPU 128 128 4 backward 1024.0 1125 953.27001953125
pr1761_multigate GRU CPU 128 128 4 forw and back 1813.0 1898 1060.0
pr1761_multigate GRU CUDA 128 128 4 forward 271.450012207031 609 46.9500007629395
pr1761_multigate GRU CUDA 128 128 4 backward 1353.0 3491 215.389999389648
pr1761_multigate GRU CUDA 128 128 4 forw and back 1928.0 4818 346.609985351562
pr1761_multigate GRU CPU 128 128 16 forward 1955.0 97 640.200012207031
pr1761_multigate GRU CPU 128 128 16 backward 4127.0 4270 3810.0
pr1761_multigate GRU CPU 128 128 16 forw and back 7025.0 7120 4350.0
pr1761_multigate GRU CUDA 128 128 16 forward 762.539978027344 2433 187.669998168945
pr1761_multigate GRU CUDA 128 128 16 backward 4574.0 13704 851.799987792969
pr1761_multigate GRU CUDA 128 128 16 forw and back 6396.0 18704 1320.0
pr1761_multigate GRU CPU 128 128 32 forward 2435.0 193 1260.0
pr1761_multigate GRU CPU 128 128 32 backward 8386.0 8462 7650.0
pr1761_multigate GRU CPU 128 128 32 forw and back 14122.0 14080 8730.0
pr1761_multigate GRU CUDA 128 128 32 forward 1421.0 4865 375.299987792969
pr1761_multigate GRU CUDA 128 128 32 backward 8819.0 27320 1660.0
pr1761_multigate GRU CUDA 128 128 32 forw and back 12320.0 37216 2640.0
pr1761_multigate GRU CPU 128 128 64 forward 7949.0 385 2540.0
pr1761_multigate GRU CPU 128 128 64 backward 16790.0 16846 15330.0
pr1761_multigate GRU CPU 128 128 64 forw and back 28142.0 28000 17490.0
pr1761_multigate GRU CUDA 128 128 64 forward 2723.0 9729 750.559997558594
pr1761_multigate GRU CUDA 128 128 64 backward 17311.0 54552 3320.0
pr1761_multigate GRU CUDA 128 128 64 forw and back 24091.0 74240 5260.0
pr1761_multigate GRU CPU 128 512 4 forward 1377.0 32 594.559997558594
pr1761_multigate GRU CPU 128 512 4 backward 1902.0 1132 3260.0
pr1761_multigate GRU CPU 128 512 4 forw and back 3432.0 1924 3400.0
pr1761_multigate GRU CUDA 128 512 4 forward 275.136993408203 630 47.2799987792969
pr1761_multigate GRU CUDA 128 512 4 backward 1423.0 3583 219.139999389648
pr1761_multigate GRU CUDA 128 512 4 forw and back 2009.00012207031 4971 352.079986572266
pr1761_multigate GRU CPU 128 512 16 forward 5680.0 128 2460.0
pr1761_multigate GRU CPU 128 512 16 backward 7926.0 4301 13310.0
pr1761_multigate GRU CPU 128 512 16 forw and back 13797.0 7230 14000.0
pr1761_multigate GRU CUDA 128 512 16 forward 780.648010253906 2526 189.119995117188
pr1761_multigate GRU CUDA 128 512 16 backward 4848.0 14084 866.97998046875
pr1761_multigate GRU CUDA 128 512 16 forw and back 6705.0 19289 1340.0
pr1761_multigate GRU CPU 128 512 32 forward 11500.0 256 4970.0
pr1761_multigate GRU CPU 128 512 32 backward 15866.0 8525 26710.0
pr1761_multigate GRU CPU 128 512 32 forw and back 27474.0 14302 28130.0
pr1761_multigate GRU CUDA 128 512 32 forward 1439.0 5054 378.25
pr1761_multigate GRU CUDA 128 512 32 backward 9411.0 28084 1690.0
pr1761_multigate GRU CUDA 128 512 32 forw and back 12827.0 38377 2670.0
pr1761_multigate GRU CPU 128 512 64 forward 23182.0 512 9990.0
pr1761_multigate GRU CPU 128 512 64 backward 31875.0 16973 53520.0
pr1761_multigate GRU CPU 128 512 64 forw and back 54836.0 28446 56380.0
pr1761_multigate GRU CUDA 128 512 64 forward 2746.0 10110 756.52001953125
pr1761_multigate GRU CUDA 128 512 64 backward 18130.0 56084 3380.0
pr1761_multigate GRU CUDA 128 512 64 forw and back 25132.0 76553 5340.0
pr1761_multigate GRU CPU 512 32 4 forward 315.055999755859 25 39.1100006103516
pr1761_multigate GRU CPU 512 32 4 backward 946.289001464844 1132 988.590026855469
pr1761_multigate GRU CPU 512 32 4 forw and back 1564.0 1905 923.380004882813
pr1761_multigate GRU CUDA 512 32 4 forward 283.980010986328 617 47.0800018310547
pr1761_multigate GRU CUDA 512 32 4 backward 1379.0 3487 215.330001831055
pr1761_multigate GRU CUDA 512 32 4 forw and back 1958.0 4834 346.859985351562
pr1761_multigate GRU CPU 512 32 16 forward 1263.0 97 165.199996948242
pr1761_multigate GRU CPU 512 32 16 backward 3815.0 4301 4010.00024414062
pr1761_multigate GRU CPU 512 32 16 forw and back 6029.0 7151 3750.0
pr1761_multigate GRU CUDA 512 32 16 forward 790.833984375 2465 188.169998168945
pr1761_multigate GRU CUDA 512 32 16 backward 4549.0 13688 851.549987792969
pr1761_multigate GRU CUDA 512 32 16 forw and back 6443.0 18768 1320.0
pr1761_multigate GRU CPU 512 32 32 forward 2536.0 193 333.329986572266
pr1761_multigate GRU CPU 512 32 32 backward 6099.0 8525 8069.99951171875
pr1761_multigate GRU CPU 512 32 32 forw and back 11996.0 14143 7540.0
pr1761_multigate GRU CUDA 512 32 32 forward 1443.0 4929 376.299987792969
pr1761_multigate GRU CUDA 512 32 32 backward 8692.0 27288 1660.0
pr1761_multigate GRU CUDA 512 32 32 forw and back 12216.0 37344 2640.0
pr1761_multigate GRU CPU 512 32 64 forward 5089.0 385 669.590026855469
pr1761_multigate GRU CPU 512 32 64 backward 15242.0 16973 16190.0009765625
pr1761_multigate GRU CPU 512 32 64 forw and back 23703.0 28127 15130.0
pr1761_multigate GRU CUDA 512 32 64 forward 2723.0 9857 752.559997558594
pr1761_multigate GRU CUDA 512 32 64 backward 17028.0 54488 3320.0
pr1761_multigate GRU CUDA 512 32 64 forw and back 24133.0 74496 5270.0
pr1761_multigate GRU CPU 512 128 4 forward 559.637023925781 25 151.110000610352
pr1761_multigate GRU CPU 512 128 4 backward 1399.0 1132 2680.0
pr1761_multigate GRU CPU 512 128 4 forw and back 2139.0 1905 2060.0
pr1761_multigate GRU CUDA 512 128 4 forward 276.165008544922 629 47.2700004577637
pr1761_multigate GRU CUDA 512 128 4 backward 1374.0 3523 215.889999389648
pr1761_multigate GRU CUDA 512 128 4 forw and back 1981.0 4870 347.420013427734
pr1761_multigate GRU CPU 512 128 16 forward 2258.0 97 640.200012207031
pr1761_multigate GRU CPU 512 128 16 backward 5840.0 4301 10900.0
pr1761_multigate GRU CPU 512 128 16 forw and back 8515.0 7151 8430.0
pr1761_multigate GRU CUDA 512 128 16 forward 788.554016113281 2513 188.919998168945
pr1761_multigate GRU CUDA 512 128 16 backward 4650.0 13832 853.799987792969
pr1761_multigate GRU CUDA 512 128 16 forw and back 6462.0 18912 1330.0
pr1761_multigate GRU CPU 512 128 32 forward 4651.0 193 1260.0
pr1761_multigate GRU CPU 512 128 32 backward 11692.0 8525 21860.0
pr1761_multigate GRU CPU 512 128 32 forw and back 16916.0 14143 16940.0
pr1761_multigate GRU CUDA 512 128 32 forward 1434.0 5025 377.799987792969
pr1761_multigate GRU CUDA 512 128 32 backward 8944.0 27576 1660.0
pr1761_multigate GRU CUDA 512 128 32 forw and back 12441.0 37632 2640.0
pr1761_multigate GRU CPU 512 128 64 forward 9432.0 385 2540.0
pr1761_multigate GRU CPU 512 128 64 backward 23472.0 16973 43790.0
pr1761_multigate GRU CPU 512 128 64 forw and back 33648.0 28127 33950.0
pr1761_multigate GRU CUDA 512 128 64 forward 2722.0 10049 755.559997558594
pr1761_multigate GRU CUDA 512 128 64 backward 17468.0 55064 3330.0
pr1761_multigate GRU CUDA 512 128 64 forw and back 24193.0 75072 5280.0
pr1761_multigate GRU CPU 512 512 4 forward 1650.0 32 594.559997558594
pr1761_multigate GRU CPU 512 512 4 backward 3454.0 1139 9510.0
pr1761_multigate GRU CPU 512 512 4 forw and back 4823.0 1931 6650.0
pr1761_multigate GRU CUDA 512 512 4 forward 314.127014160156 650 47.5900001525879
pr1761_multigate GRU CUDA 512 512 4 backward 1488.0 3615 219.639999389648
pr1761_multigate GRU CUDA 512 512 4 forw and back 2077.0 5023 352.890014648437
pr1761_multigate GRU CPU 512 512 16 forward 7131.0 128 2460.0
pr1761_multigate GRU CPU 512 512 16 backward 14083.0 4332 38400.0
pr1761_multigate GRU CPU 512 512 16 forw and back 18896.0 7261 27090.0
pr1761_multigate GRU CUDA 512 512 16 forward 903.867980957031 2606 190.380004882812
pr1761_multigate GRU CUDA 512 512 16 backward 5050.0 14212 868.97998046875
pr1761_multigate GRU CUDA 512 512 16 forw and back 6913.0 19497 1350.0
pr1761_multigate GRU CPU 512 512 32 forward 14344.0 256 4970.0
pr1761_multigate GRU CPU 512 512 32 backward 28299.0 8588 76920.0
pr1761_multigate GRU CPU 512 512 32 forw and back 37622.0 14365 54340.0
pr1761_multigate GRU CUDA 512 512 32 forward 1677.0 5214 380.75
pr1761_multigate GRU CUDA 512 512 32 backward 9893.0 28340 1690.0
pr1761_multigate GRU CUDA 512 512 32 forw and back 13357.0 38793 2680.0
pr1761_multigate GRU CPU 512 512 64 forward 28770.0 512 9990.0
pr1761_multigate GRU CPU 512 512 64 backward 56732.0 17100 153970.0
pr1761_multigate GRU CPU 512 512 64 forw and back 74510.0 28573 108840.0
pr1761_multigate GRU CUDA 512 512 64 forward 3243.0 10430 761.52001953125
pr1761_multigate GRU CUDA 512 512 64 backward 19664.0 56596 3390.0
pr1761_multigate GRU CUDA 512 512 64 forw and back 26862.0 77385 5350.0
LSTM CPU c=32 n=32 ts=4
forward
36.511 μs (37 allocations: 71.14 KiB)
backward
157.997 μs (608 allocations: 233.70 KiB)
forw and back
328.358 μs (1495 allocations: 390.67 KiB)
LSTM CUDA c=32 n=32 ts=4
forward
331.403 μs (809 allocations: 56.45 KiB)
backward
1.063 ms (3075 allocations: 208.31 KiB)
forw and back
1.585 ms (4541 allocations: 309.48 KiB)
LSTM CPU c=32 n=32 ts=16
forward
163.047 μs (145 allocations: 296.23 KiB)
backward
588.481 μs (2300 allocations: 950.73 KiB)
forw and back
1.193 ms (5575 allocations: 1.53 MiB)
LSTM CUDA c=32 n=32 ts=16
forward
997.360 μs (3233 allocations: 225.67 KiB)
backward
3.549 ms (12219 allocations: 831.17 KiB)
forw and back
5.160 ms (17741 allocations: 1.18 MiB)
LSTM CPU c=32 n=32 ts=32
forward
335.327 μs (289 allocations: 596.36 KiB)
backward
1.162 ms (4557 allocations: 1.86 MiB)
forw and back
2.329 ms (11017 allocations: 3.07 MiB)
LSTM CUDA c=32 n=32 ts=32
forward
1.884 ms (6465 allocations: 451.30 KiB)
backward
6.835 ms (24412 allocations: 1.62 MiB)
forw and back
9.937 ms (35343 allocations: 2.36 MiB)
LSTM CPU c=32 n=32 ts=64
forward
677.271 μs (577 allocations: 1.17 MiB)
backward
2.348 ms (9069 allocations: 3.73 MiB)
forw and back
4.628 ms (21897 allocations: 6.14 MiB)
LSTM CUDA c=32 n=32 ts=64
forward
3.638 ms (12929 allocations: 902.56 KiB)
backward
13.431 ms (48796 allocations: 3.24 MiB)
forw and back
19.508 ms (70543 allocations: 4.71 MiB)
LSTM CPU c=32 n=128 ts=4
forward
139.940 μs (37 allocations: 276.64 KiB)
backward
275.686 μs (608 allocations: 722.83 KiB)
forw and back
582.452 μs (1495 allocations: 1.10 MiB)
LSTM CUDA c=32 n=128 ts=4
forward
343.437 μs (817 allocations: 56.58 KiB)
backward
1.101 ms (3119 allocations: 209.00 KiB)
forw and back
1.629 ms (4585 allocations: 310.17 KiB)
LSTM CPU c=32 n=128 ts=16
forward
588.313 μs (145 allocations: 1.13 MiB)
backward
1.068 ms (2300 allocations: 2.86 MiB)
forw and back
2.170 ms (5575 allocations: 4.44 MiB)
LSTM CUDA c=32 n=128 ts=16
forward
1.043 ms (3265 allocations: 226.17 KiB)
backward
3.662 ms (12395 allocations: 833.92 KiB)
forw and back
5.349 ms (17917 allocations: 1.19 MiB)
LSTM CPU c=32 n=128 ts=32
forward
1.190 ms (289 allocations: 2.27 MiB)
backward
2.180 ms (4557 allocations: 5.73 MiB)
forw and back
4.349 ms (11017 allocations: 8.91 MiB)
LSTM CUDA c=32 n=128 ts=32
forward
1.917 ms (6529 allocations: 452.30 KiB)
backward
6.973 ms (24764 allocations: 1.63 MiB)
forw and back
10.145 ms (35695 allocations: 2.36 MiB)
LSTM CPU c=32 n=128 ts=64
forward
2.394 ms (577 allocations: 4.56 MiB)
backward
4.499 ms (9069 allocations: 11.46 MiB)
forw and back
8.673 ms (21897 allocations: 17.84 MiB)
LSTM CUDA c=32 n=128 ts=64
forward
3.725 ms (13057 allocations: 904.56 KiB)
backward
13.667 ms (49500 allocations: 3.26 MiB)
forw and back
19.937 ms (71247 allocations: 4.72 MiB)
LSTM CPU c=32 n=512 ts=4
forward
1.348 ms (48 allocations: 1.07 MiB)
backward
1.674 ms (636 allocations: 2.60 MiB)
forw and back
3.579 ms (1546 allocations: 3.93 MiB)
LSTM CUDA c=32 n=512 ts=4
forward
349.663 μs (838 allocations: 56.91 KiB)
backward
1.156 ms (3147 allocations: 209.44 KiB)
forw and back
1.698 ms (4666 allocations: 312.08 KiB)
LSTM CPU c=32 n=512 ts=16
forward
5.774 ms (192 allocations: 4.45 MiB)
backward
7.088 ms (2412 allocations: 10.51 MiB)
forw and back
14.627 ms (5782 allocations: 15.99 MiB)
LSTM CUDA c=32 n=512 ts=16
forward
1.073 ms (3358 allocations: 227.62 KiB)
backward
3.879 ms (12519 allocations: 835.86 KiB)
forw and back
5.569 ms (18214 allocations: 1.19 MiB)
LSTM CPU c=32 n=512 ts=32
forward
9.449 ms (384 allocations: 8.97 MiB)
backward
14.147 ms (4781 allocations: 21.06 MiB)
forw and back
28.861 ms (11432 allocations: 32.07 MiB)
LSTM CUDA c=32 n=512 ts=32
forward
1.981 ms (6718 allocations: 455.25 KiB)
backward
7.394 ms (25016 allocations: 1.63 MiB)
forw and back
10.652 ms (36280 allocations: 2.37 MiB)
LSTM CPU c=32 n=512 ts=64
forward
23.437 ms (768 allocations: 17.99 MiB)
backward
28.574 ms (9517 allocations: 42.15 MiB)
forw and back
57.866 ms (22728 allocations: 64.22 MiB)
LSTM CUDA c=32 n=512 ts=64
forward
3.811 ms (13438 allocations: 910.52 KiB)
backward
14.319 ms (50008 allocations: 3.26 MiB)
forw and back
20.900 ms (72408 allocations: 4.74 MiB)
LSTM CPU c=128 n=32 ts=4
forward
50.745 μs (37 allocations: 71.14 KiB)
backward
199.522 μs (608 allocations: 413.70 KiB)
forw and back
396.784 μs (1495 allocations: 522.67 KiB)
LSTM CUDA c=128 n=32 ts=4
forward
364.986 μs (809 allocations: 56.45 KiB)
backward
1.135 ms (3075 allocations: 208.31 KiB)
forw and back
1.676 ms (4541 allocations: 309.48 KiB)
LSTM CPU c=128 n=32 ts=16
forward
218.906 μs (145 allocations: 296.23 KiB)
backward
755.577 μs (2300 allocations: 1.67 MiB)
forw and back
1.420 ms (5575 allocations: 2.08 MiB)
LSTM CUDA c=128 n=32 ts=16
forward
1.058 ms (3233 allocations: 225.67 KiB)
backward
3.679 ms (12219 allocations: 831.17 KiB)
forw and back
5.374 ms (17741 allocations: 1.18 MiB)
LSTM CPU c=128 n=32 ts=32
forward
447.689 μs (289 allocations: 596.36 KiB)
backward
1.522 ms (4557 allocations: 3.35 MiB)
forw and back
2.791 ms (11017 allocations: 4.18 MiB)
LSTM CUDA c=128 n=32 ts=32
forward
1.960 ms (6465 allocations: 451.30 KiB)
backward
7.010 ms (24412 allocations: 1.62 MiB)
forw and back
10.229 ms (35343 allocations: 2.36 MiB)
LSTM CPU c=128 n=32 ts=64
forward
906.688 μs (577 allocations: 1.17 MiB)
backward
3.214 ms (9069 allocations: 6.72 MiB)
forw and back
5.648 ms (21897 allocations: 8.38 MiB)
LSTM CUDA c=128 n=32 ts=64
forward
3.743 ms (12929 allocations: 902.56 KiB)
backward
13.742 ms (48796 allocations: 3.24 MiB)
forw and back
19.897 ms (70543 allocations: 4.71 MiB)
LSTM CPU c=128 n=128 ts=4
forward
493.348 μs (37 allocations: 276.64 KiB)
backward
996.974 μs (616 allocations: 1.16 MiB)
forw and back
1.866 ms (1499 allocations: 1.36 MiB)
LSTM CUDA c=128 n=128 ts=4
forward
357.844 μs (817 allocations: 56.58 KiB)
backward
1.130 ms (3119 allocations: 209.00 KiB)
forw and back
1.680 ms (4585 allocations: 310.17 KiB)
LSTM CPU c=128 n=128 ts=16
forward
2.087 ms (145 allocations: 1.13 MiB)
backward
4.089 ms (2332 allocations: 4.72 MiB)
forw and back
7.342 ms (5591 allocations: 5.56 MiB)
LSTM CUDA c=128 n=128 ts=16
forward
1.053 ms (3265 allocations: 226.17 KiB)
backward
3.694 ms (12395 allocations: 833.92 KiB)
forw and back
5.376 ms (17917 allocations: 1.19 MiB)
LSTM CPU c=128 n=128 ts=32
forward
4.222 ms (289 allocations: 2.27 MiB)
backward
8.297 ms (4621 allocations: 9.46 MiB)
forw and back
14.760 ms (11049 allocations: 11.14 MiB)
LSTM CUDA c=128 n=128 ts=32
forward
1.931 ms (6529 allocations: 452.30 KiB)
backward
7.032 ms (24764 allocations: 1.63 MiB)
forw and back
10.200 ms (35695 allocations: 2.36 MiB)
LSTM CPU c=128 n=128 ts=64
forward
8.167 ms (577 allocations: 4.56 MiB)
backward
16.636 ms (9197 allocations: 18.94 MiB)
forw and back
29.544 ms (21961 allocations: 22.32 MiB)
LSTM CUDA c=128 n=128 ts=64
forward
3.787 ms (13057 allocations: 904.56 KiB)
backward
13.967 ms (49500 allocations: 3.26 MiB)
forw and back
20.377 ms (71247 allocations: 4.72 MiB)
LSTM CPU c=128 n=512 ts=4
forward
1.435 ms (48 allocations: 1.07 MiB)
backward
1.989 ms (636 allocations: 4.18 MiB)
forw and back
3.844 ms (1546 allocations: 4.76 MiB)
LSTM CUDA c=128 n=512 ts=4
forward
380.586 μs (838 allocations: 56.91 KiB)
backward
1.202 ms (3147 allocations: 209.44 KiB)
forw and back
1.767 ms (4666 allocations: 312.08 KiB)
LSTM CPU c=128 n=512 ts=16
forward
3.478 ms (192 allocations: 4.45 MiB)
backward
8.329 ms (2412 allocations: 16.88 MiB)
forw and back
15.553 ms (5782 allocations: 19.35 MiB)
LSTM CUDA c=128 n=512 ts=16
forward
1.120 ms (3358 allocations: 227.62 KiB)
backward
3.991 ms (12519 allocations: 835.86 KiB)
forw and back
5.734 ms (18214 allocations: 1.19 MiB)
LSTM CPU c=128 n=512 ts=32
forward
12.533 ms (384 allocations: 8.97 MiB)
backward
16.724 ms (4781 allocations: 33.80 MiB)
forw and back
30.750 ms (11432 allocations: 38.80 MiB)
LSTM CUDA c=128 n=512 ts=32
forward
2.100 ms (6718 allocations: 455.25 KiB)
backward
7.663 ms (25016 allocations: 1.63 MiB)
forw and back
10.944 ms (36280 allocations: 2.37 MiB)
LSTM CPU c=128 n=512 ts=64
forward
25.193 ms (768 allocations: 17.99 MiB)
backward
33.564 ms (9517 allocations: 67.64 MiB)
forw and back
61.978 ms (22728 allocations: 77.71 MiB)
LSTM CUDA c=128 n=512 ts=64
forward
4.050 ms (13438 allocations: 910.52 KiB)
backward
14.960 ms (50008 allocations: 3.26 MiB)
forw and back
21.488 ms (72408 allocations: 4.74 MiB)
LSTM CPU c=512 n=32 ts=4
forward
282.930 μs (37 allocations: 71.14 KiB)
backward
867.287 μs (623 allocations: 1.11 MiB)
forw and back
1.494 ms (1506 allocations: 1.03 MiB)
LSTM CUDA c=512 n=32 ts=4
forward
365.976 μs (829 allocations: 56.77 KiB)
backward
1.133 ms (3107 allocations: 208.81 KiB)
forw and back
1.678 ms (4593 allocations: 310.30 KiB)
LSTM CPU c=512 n=32 ts=16
forward
1.185 ms (145 allocations: 296.23 KiB)
backward
3.641 ms (2363 allocations: 4.62 MiB)
forw and back
5.847 ms (5622 allocations: 4.28 MiB)
LSTM CUDA c=512 n=32 ts=16
forward
1.056 ms (3313 allocations: 226.92 KiB)
backward
3.712 ms (12347 allocations: 833.17 KiB)
forw and back
5.441 ms (17949 allocations: 1.19 MiB)
LSTM CPU c=512 n=32 ts=32
forward
2.399 ms (289 allocations: 596.36 KiB)
backward
7.361 ms (4684 allocations: 9.29 MiB)
forw and back
11.652 ms (11112 allocations: 8.63 MiB)
LSTM CUDA c=512 n=32 ts=32
forward
1.982 ms (6625 allocations: 453.80 KiB)
backward
7.102 ms (24668 allocations: 1.63 MiB)
forw and back
10.382 ms (35759 allocations: 2.37 MiB)
LSTM CPU c=512 n=32 ts=64
forward
4.867 ms (577 allocations: 1.17 MiB)
backward
14.773 ms (9324 allocations: 18.65 MiB)
forw and back
23.629 ms (22088 allocations: 17.32 MiB)
LSTM CUDA c=512 n=32 ts=64
forward
3.763 ms (13249 allocations: 907.56 KiB)
backward
13.788 ms (49308 allocations: 3.25 MiB)
forw and back
20.283 ms (71375 allocations: 4.72 MiB)
LSTM CPU c=512 n=128 ts=4
forward
558.467 μs (37 allocations: 276.64 KiB)
backward
1.349 ms (623 allocations: 2.99 MiB)
forw and back
2.169 ms (1506 allocations: 2.44 MiB)
LSTM CUDA c=512 n=128 ts=4
forward
406.861 μs (837 allocations: 56.89 KiB)
backward
1.166 ms (3151 allocations: 209.50 KiB)
forw and back
1.748 ms (4637 allocations: 310.98 KiB)
LSTM CPU c=512 n=128 ts=16
forward
2.391 ms (145 allocations: 1.13 MiB)
backward
5.799 ms (2363 allocations: 12.17 MiB)
forw and back
8.837 ms (5622 allocations: 10.01 MiB)
LSTM CUDA c=512 n=128 ts=16
forward
1.194 ms (3345 allocations: 227.42 KiB)
backward
3.846 ms (12523 allocations: 835.92 KiB)
forw and back
5.701 ms (18125 allocations: 1.19 MiB)
LSTM CPU c=512 n=128 ts=32
forward
4.913 ms (289 allocations: 2.27 MiB)
backward
11.581 ms (4684 allocations: 24.41 MiB)
forw and back
17.497 ms (11112 allocations: 20.09 MiB)
LSTM CUDA c=512 n=128 ts=32
forward
2.245 ms (6689 allocations: 454.80 KiB)
backward
7.364 ms (25020 allocations: 1.63 MiB)
forw and back
10.827 ms (36111 allocations: 2.37 MiB)
LSTM CPU c=512 n=128 ts=64
forward
9.983 ms (577 allocations: 4.56 MiB)
backward
23.183 ms (9324 allocations: 48.88 MiB)
forw and back
35.427 ms (22088 allocations: 40.26 MiB)
LSTM CUDA c=512 n=128 ts=64
forward
4.330 ms (13377 allocations: 909.56 KiB)
backward
14.335 ms (50012 allocations: 3.26 MiB)
forw and back
21.230 ms (72079 allocations: 4.73 MiB)
LSTM CPU c=512 n=512 ts=4
forward
1.688 ms (48 allocations: 1.07 MiB)
backward
3.557 ms (643 allocations: 10.51 MiB)
forw and back
5.102 ms (1553 allocations: 8.09 MiB)
LSTM CUDA c=512 n=512 ts=4
forward
399.813 μs (858 allocations: 57.22 KiB)
backward
1.241 ms (3179 allocations: 209.94 KiB)
forw and back
1.815 ms (4718 allocations: 312.89 KiB)
LSTM CPU c=512 n=512 ts=16
forward
7.528 ms (192 allocations: 4.45 MiB)
backward
14.394 ms (2443 allocations: 42.33 MiB)
forw and back
20.404 ms (5813 allocations: 32.80 MiB)
LSTM CUDA c=512 n=512 ts=16
forward
1.216 ms (3438 allocations: 228.88 KiB)
backward
4.185 ms (12647 allocations: 837.86 KiB)
forw and back
5.924 ms (18422 allocations: 1.20 MiB)
LSTM CPU c=512 n=512 ts=32
forward
15.272 ms (384 allocations: 8.97 MiB)
backward
29.294 ms (4844 allocations: 84.75 MiB)
forw and back
41.290 ms (11495 allocations: 65.75 MiB)
LSTM CUDA c=512 n=512 ts=32
forward
2.254 ms (6878 allocations: 457.75 KiB)
backward
8.176 ms (25272 allocations: 1.64 MiB)
forw and back
11.253 ms (36696 allocations: 2.38 MiB)
LSTM CPU c=512 n=512 ts=64
forward
30.405 ms (768 allocations: 17.99 MiB)
backward
58.086 ms (9644 allocations: 169.59 MiB)
forw and back
81.369 ms (22855 allocations: 131.65 MiB)
LSTM CUDA c=512 n=512 ts=64
forward
4.307 ms (13758 allocations: 915.52 KiB)
backward
16.188 ms (50520 allocations: 3.27 MiB)
forw and back
22.430 ms (73240 allocations: 4.75 MiB)
GRU CPU c=32 n=32 ts=4
forward
33.804 μs (25 allocations: 39.11 KiB)
backward
178.609 μs (844 allocations: 250.83 KiB)
forw and back
367.103 μs (1721 allocations: 424.92 KiB)
GRU CUDA c=32 n=32 ts=4
forward
382.705 μs (1005 allocations: 117.64 KiB)
backward
1.490 ms (3466 allocations: 219.28 KiB)
forw and back
2.087 ms (5025 allocations: 355.75 KiB)
GRU CPU c=32 n=32 ts=16
forward
146.961 μs (97 allocations: 165.20 KiB)
backward
717.275 μs (3149 allocations: 1.02 MiB)
forw and back
1.371 ms (6415 allocations: 1.69 MiB)
GRU CUDA c=32 n=32 ts=16
forward
1.158 ms (4017 allocations: 470.42 KiB)
backward
4.959 ms (13607 allocations: 867.50 KiB)
forw and back
6.879 ms (19535 allocations: 1.36 MiB)
GRU CPU c=32 n=32 ts=32
forward
305.891 μs (193 allocations: 333.33 KiB)
backward
1.411 ms (6221 allocations: 2.06 MiB)
forw and back
2.689 ms (12671 allocations: 3.40 MiB)
GRU CUDA c=32 n=32 ts=32
forward
2.169 ms (8033 allocations: 940.80 KiB)
backward
9.623 ms (27127 allocations: 1.69 MiB)
forw and back
13.479 ms (38879 allocations: 2.71 MiB)
GRU CPU c=32 n=32 ts=64
forward
617.131 μs (385 allocations: 669.59 KiB)
backward
2.846 ms (12365 allocations: 4.13 MiB)
forw and back
5.336 ms (25183 allocations: 6.81 MiB)
GRU CUDA c=32 n=32 ts=64
forward
4.278 ms (16065 allocations: 1.84 MiB)
backward
18.976 ms (54167 allocations: 3.38 MiB)
forw and back
26.450 ms (77567 allocations: 5.41 MiB)
GRU CPU c=32 n=128 ts=4
forward
127.137 μs (25 allocations: 151.11 KiB)
backward
312.210 μs (844 allocations: 751.95 KiB)
forw and back
612.394 μs (1721 allocations: 1.09 MiB)
GRU CUDA c=32 n=128 ts=4
forward
382.322 μs (1021 allocations: 117.89 KiB)
backward
1.505 ms (3502 allocations: 219.84 KiB)
forw and back
2.110 ms (5061 allocations: 356.31 KiB)
GRU CPU c=32 n=128 ts=16
forward
546.475 μs (97 allocations: 640.20 KiB)
backward
1.208 ms (3149 allocations: 3.10 MiB)
forw and back
2.254 ms (6415 allocations: 4.53 MiB)
GRU CUDA c=32 n=128 ts=16
forward
1.167 ms (4081 allocations: 471.42 KiB)
backward
5.157 ms (13751 allocations: 869.75 KiB)
forw and back
7.216 ms (19679 allocations: 1.36 MiB)
GRU CPU c=32 n=128 ts=32
forward
1.094 ms (193 allocations: 1.26 MiB)
backward
2.450 ms (6221 allocations: 6.26 MiB)
forw and back
4.498 ms (12671 allocations: 9.12 MiB)
GRU CUDA c=32 n=128 ts=32
forward
2.173 ms (8161 allocations: 942.80 KiB)
backward
9.933 ms (27415 allocations: 1.70 MiB)
forw and back
13.795 ms (39167 allocations: 2.71 MiB)
GRU CPU c=32 n=128 ts=64
forward
2.214 ms (385 allocations: 2.54 MiB)
backward
5.052 ms (12365 allocations: 12.58 MiB)
forw and back
8.956 ms (25183 allocations: 18.30 MiB)
GRU CUDA c=32 n=128 ts=64
forward
4.272 ms (16321 allocations: 1.84 MiB)
backward
19.251 ms (54743 allocations: 3.39 MiB)
forw and back
26.773 ms (78143 allocations: 5.42 MiB)
GRU CPU c=32 n=512 ts=4
forward
650.535 μs (32 allocations: 594.56 KiB)
backward
1.806 ms (880 allocations: 2.68 MiB)
forw and back
3.443 ms (1772 allocations: 3.74 MiB)
GRU CUDA c=32 n=512 ts=4
forward
380.510 μs (1042 allocations: 118.22 KiB)
backward
1.581 ms (3594 allocations: 223.59 KiB)
forw and back
2.205 ms (5214 allocations: 361.78 KiB)
GRU CPU c=32 n=512 ts=16
forward
5.215 ms (128 allocations: 2.46 MiB)
backward
7.626 ms (3305 allocations: 11.37 MiB)
forw and back
14.006 ms (6634 allocations: 15.76 MiB)
GRU CUDA c=32 n=512 ts=16
forward
1.169 ms (4174 allocations: 472.88 KiB)
backward
5.440 ms (14131 allocations: 884.94 KiB)
forw and back
7.500 ms (20264 allocations: 1.38 MiB)
GRU CPU c=32 n=512 ts=32
forward
10.522 ms (256 allocations: 4.97 MiB)
backward
15.234 ms (6537 allocations: 22.95 MiB)
forw and back
27.659 ms (13114 allocations: 31.77 MiB)
GRU CUDA c=32 n=512 ts=32
forward
2.190 ms (8350 allocations: 945.75 KiB)
backward
10.514 ms (28179 allocations: 1.73 MiB)
forw and back
14.553 ms (40328 allocations: 2.75 MiB)
GRU CPU c=32 n=512 ts=64
forward
21.181 ms (512 allocations: 9.99 MiB)
backward
30.551 ms (13001 allocations: 46.12 MiB)
forw and back
55.245 ms (26074 allocations: 63.80 MiB)
GRU CUDA c=32 n=512 ts=64
forward
4.341 ms (16702 allocations: 1.85 MiB)
backward
20.676 ms (56275 allocations: 3.45 MiB)
forw and back
28.481 ms (80456 allocations: 5.49 MiB)
GRU CPU c=128 n=32 ts=4
forward
45.944 μs (25 allocations: 39.11 KiB)
backward
227.995 μs (844 allocations: 409.83 KiB)
forw and back
436.912 μs (1721 allocations: 535.92 KiB)
GRU CUDA c=128 n=32 ts=4
forward
398.092 μs (1005 allocations: 117.64 KiB)
backward
1.530 ms (3466 allocations: 219.28 KiB)
forw and back
2.135 ms (5025 allocations: 355.75 KiB)
GRU CPU c=128 n=32 ts=16
forward
193.138 μs (97 allocations: 165.20 KiB)
backward
845.771 μs (3149 allocations: 1.67 MiB)
forw and back
1.542 ms (6415 allocations: 2.15 MiB)
GRU CUDA c=128 n=32 ts=16
forward
1.207 ms (4017 allocations: 470.42 KiB)
backward
5.151 ms (13607 allocations: 867.50 KiB)
forw and back
7.240 ms (19535 allocations: 1.36 MiB)
GRU CPU c=128 n=32 ts=32
forward
394.700 μs (193 allocations: 333.33 KiB)
backward
1.702 ms (6221 allocations: 3.36 MiB)
forw and back
3.038 ms (12671 allocations: 4.33 MiB)
GRU CUDA c=128 n=32 ts=32
forward
2.262 ms (8033 allocations: 940.80 KiB)
backward
9.877 ms (27127 allocations: 1.69 MiB)
forw and back
13.811 ms (38879 allocations: 2.71 MiB)
GRU CPU c=128 n=32 ts=64
forward
796.808 μs (385 allocations: 669.59 KiB)
backward
3.573 ms (12365 allocations: 6.75 MiB)
forw and back
6.142 ms (25183 allocations: 8.68 MiB)
GRU CUDA c=128 n=32 ts=64
forward
4.383 ms (16065 allocations: 1.84 MiB)
backward
19.106 ms (54167 allocations: 3.38 MiB)
forw and back
26.794 ms (77567 allocations: 5.41 MiB)
GRU CPU c=128 n=128 ts=4
forward
459.966 μs (25 allocations: 151.11 KiB)
backward
1.041 ms (852 allocations: 1.17 MiB)
forw and back
1.103 ms (1725 allocations: 1.34 MiB)
GRU CUDA c=128 n=128 ts=4
forward
389.419 μs (1021 allocations: 117.89 KiB)
backward
1.522 ms (3502 allocations: 219.84 KiB)
forw and back
2.154 ms (5061 allocations: 356.31 KiB)
GRU CPU c=128 n=128 ts=16
forward
1.207 ms (97 allocations: 640.20 KiB)
backward
4.282 ms (3181 allocations: 4.87 MiB)
forw and back
7.375 ms (6431 allocations: 5.55 MiB)
GRU CUDA c=128 n=128 ts=16
forward
1.176 ms (4081 allocations: 471.42 KiB)
backward
5.240 ms (13751 allocations: 869.75 KiB)
forw and back
7.321 ms (19679 allocations: 1.36 MiB)
GRU CPU c=128 n=128 ts=32
forward
3.869 ms (193 allocations: 1.26 MiB)
backward
6.375 ms (6285 allocations: 9.81 MiB)
forw and back
12.397 ms (12703 allocations: 11.17 MiB)
GRU CUDA c=128 n=128 ts=32
forward
2.221 ms (8161 allocations: 942.80 KiB)
backward
10.115 ms (27415 allocations: 1.70 MiB)
forw and back
14.042 ms (39167 allocations: 2.71 MiB)
GRU CPU c=128 n=128 ts=64
forward
7.791 ms (385 allocations: 2.54 MiB)
backward
17.688 ms (12493 allocations: 19.69 MiB)
forw and back
29.597 ms (25247 allocations: 22.41 MiB)
GRU CUDA c=128 n=128 ts=64
forward
4.373 ms (16321 allocations: 1.84 MiB)
backward
19.570 ms (54743 allocations: 3.39 MiB)
forw and back
26.788 ms (78143 allocations: 5.42 MiB)
GRU CPU c=128 n=512 ts=4
forward
1.295 ms (32 allocations: 594.56 KiB)
backward
2.074 ms (880 allocations: 4.24 MiB)
forw and back
3.677 ms (1772 allocations: 4.56 MiB)
GRU CUDA c=128 n=512 ts=4
forward
389.225 μs (1042 allocations: 118.22 KiB)
backward
1.583 ms (3594 allocations: 223.59 KiB)
forw and back
2.208 ms (5214 allocations: 361.78 KiB)
GRU CPU c=128 n=512 ts=16
forward
5.538 ms (128 allocations: 2.46 MiB)
backward
8.798 ms (3305 allocations: 17.64 MiB)
forw and back
14.756 ms (6634 allocations: 19.03 MiB)
GRU CUDA c=128 n=512 ts=16
forward
1.176 ms (4174 allocations: 472.88 KiB)
backward
5.420 ms (14131 allocations: 884.94 KiB)
forw and back
7.479 ms (20264 allocations: 1.38 MiB)
GRU CPU c=128 n=512 ts=32
forward
11.260 ms (256 allocations: 4.97 MiB)
backward
17.652 ms (6537 allocations: 35.50 MiB)
forw and back
29.339 ms (13114 allocations: 38.32 MiB)
GRU CUDA c=128 n=512 ts=32
forward
2.246 ms (8350 allocations: 945.75 KiB)
backward
10.603 ms (28179 allocations: 1.73 MiB)
forw and back
14.578 ms (40328 allocations: 2.75 MiB)
GRU CPU c=128 n=512 ts=64
forward
22.706 ms (512 allocations: 9.99 MiB)
backward
35.436 ms (13001 allocations: 71.24 MiB)
forw and back
59.069 ms (26074 allocations: 76.92 MiB)
GRU CUDA c=128 n=512 ts=64
forward
4.426 ms (16702 allocations: 1.85 MiB)
backward
20.861 ms (56275 allocations: 3.45 MiB)
forw and back
28.711 ms (80456 allocations: 5.49 MiB)
GRU CPU c=512 n=32 ts=4
forward
308.929 μs (25 allocations: 39.11 KiB)
backward
949.953 μs (859 allocations: 1.02 MiB)
forw and back
1.598 ms (1732 allocations: 979.06 KiB)
GRU CUDA c=512 n=32 ts=4
forward
396.805 μs (1025 allocations: 117.95 KiB)
backward
1.551 ms (3498 allocations: 219.78 KiB)
forw and back
2.157 ms (5077 allocations: 356.56 KiB)
GRU CPU c=512 n=32 ts=16
forward
1.251 ms (97 allocations: 165.20 KiB)
backward
3.884 ms (3212 allocations: 4.26 MiB)
forw and back
6.193 ms (6462 allocations: 3.99 MiB)
GRU CUDA c=512 n=32 ts=16
forward
1.193 ms (4097 allocations: 471.67 KiB)
backward
5.133 ms (13735 allocations: 869.50 KiB)
forw and back
7.216 ms (19743 allocations: 1.36 MiB)
GRU CPU c=512 n=32 ts=32
forward
2.510 ms (193 allocations: 333.33 KiB)
backward
7.843 ms (6348 allocations: 8.57 MiB)
forw and back
12.336 ms (12766 allocations: 8.04 MiB)
GRU CUDA c=512 n=32 ts=32
forward
2.221 ms (8193 allocations: 943.30 KiB)
backward
9.822 ms (27383 allocations: 1.70 MiB)
forw and back
13.770 ms (39295 allocations: 2.71 MiB)
GRU CPU c=512 n=32 ts=64
forward
5.059 ms (385 allocations: 669.59 KiB)
backward
15.663 ms (12620 allocations: 17.20 MiB)
forw and back
24.714 ms (25374 allocations: 16.13 MiB)
GRU CUDA c=512 n=32 ts=64
forward
4.358 ms (16385 allocations: 1.84 MiB)
backward
19.248 ms (54679 allocations: 3.39 MiB)
forw and back
26.556 ms (78399 allocations: 5.42 MiB)
GRU CPU c=512 n=128 ts=4
forward
535.009 μs (25 allocations: 151.11 KiB)
backward
1.459 ms (859 allocations: 2.92 MiB)
forw and back
2.211 ms (1732 allocations: 2.33 MiB)
GRU CUDA c=512 n=128 ts=4
forward
388.542 μs (1041 allocations: 118.20 KiB)
backward
1.544 ms (3534 allocations: 220.34 KiB)
forw and back
2.170 ms (5113 allocations: 357.12 KiB)
GRU CPU c=512 n=128 ts=16
forward
2.239 ms (97 allocations: 640.20 KiB)
backward
6.088 ms (3212 allocations: 11.96 MiB)
forw and back
8.836 ms (6462 allocations: 9.64 MiB)
GRU CUDA c=512 n=128 ts=16
forward
1.186 ms (4161 allocations: 472.67 KiB)
backward
5.222 ms (13879 allocations: 871.75 KiB)
forw and back
7.237 ms (19887 allocations: 1.36 MiB)
GRU CPU c=512 n=128 ts=32
forward
4.584 ms (193 allocations: 1.26 MiB)
backward
12.185 ms (6348 allocations: 24.02 MiB)
forw and back
17.606 ms (12766 allocations: 19.38 MiB)
GRU CUDA c=512 n=128 ts=32
forward
2.194 ms (8321 allocations: 945.30 KiB)
backward
10.055 ms (27671 allocations: 1.70 MiB)
forw and back
13.803 ms (39583 allocations: 2.72 MiB)
GRU CPU c=512 n=128 ts=64
forward
9.301 ms (385 allocations: 2.54 MiB)
backward
24.435 ms (12620 allocations: 48.14 MiB)
forw and back
35.196 ms (25374 allocations: 38.87 MiB)
GRU CUDA c=512 n=128 ts=64
forward
4.295 ms (16641 allocations: 1.85 MiB)
backward
19.502 ms (55255 allocations: 3.40 MiB)
forw and back
26.931 ms (78975 allocations: 5.43 MiB)
GRU CPU c=512 n=512 ts=4
forward
1.608 ms (32 allocations: 594.56 KiB)
backward
3.340 ms (887 allocations: 10.48 MiB)
forw and back
5.084 ms (1779 allocations: 7.80 MiB)
GRU CUDA c=512 n=512 ts=4
forward
435.326 μs (1062 allocations: 118.53 KiB)
backward
1.656 ms (3626 allocations: 224.09 KiB)
forw and back
2.307 ms (5266 allocations: 362.59 KiB)
GRU CPU c=512 n=512 ts=16
forward
7.007 ms (128 allocations: 2.46 MiB)
backward
15.175 ms (3336 allocations: 42.73 MiB)
forw and back
19.999 ms (6665 allocations: 32.12 MiB)
GRU CUDA c=512 n=512 ts=16
forward
1.338 ms (4254 allocations: 474.12 KiB)
backward
5.731 ms (14259 allocations: 886.94 KiB)
forw and back
7.789 ms (20472 allocations: 1.38 MiB)
GRU CPU c=512 n=512 ts=32
forward
14.150 ms (256 allocations: 4.97 MiB)
backward
30.405 ms (6600 allocations: 85.71 MiB)
forw and back
39.853 ms (13177 allocations: 64.54 MiB)
GRU CUDA c=512 n=512 ts=32
forward
2.516 ms (8510 allocations: 948.25 KiB)
backward
11.757 ms (28435 allocations: 1.73 MiB)
forw and back
15.063 ms (40744 allocations: 2.76 MiB)
GRU CPU c=512 n=512 ts=64
forward
28.400 ms (512 allocations: 9.99 MiB)
backward
60.943 ms (13128 allocations: 171.69 MiB)
forw and back
79.459 ms (26201 allocations: 129.37 MiB)
GRU CUDA c=512 n=512 ts=64
forward
4.888 ms (17022 allocations: 1.85 MiB)
backward
22.380 ms (56787 allocations: 3.46 MiB)
forw and back
29.432 ms (81288 allocations: 5.50 MiB)
struct RNNWrapper{T}
rnn::T
end
Flux.@functor RNNWrapper
function (r::RNNWrapper)(xs)
Flux.reset!(r.rnn)
[r.rnn(x) for x in xs][end]
end
for rnn_type in [Flux.LSTM, Flux.GRU]
for c in [32, 128, 512]
rnn = RNNWrapper(rnn_type(c, 8))
grnn = gpu(rnn)
for n in [32, 128, 512]
for ts in [4, 16, 64]
xs = [randn(Float32, c, n) for _ in 1:ts]
println("$rnn_type CPU c=$c n=$n ts=$ts")
run_benchmark(rnn, xs, cuda=false)
println("$rnn_type CUDA c=$c n=$n ts=$ts")
run_benchmark(grnn, xs, cuda=true)
end
end
end
end
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"params": [
{
"name": "device",
"select": {
"type": "point",
"fields": ["device"]
},
"bind": "legend"
}
],
"data": {
"name": "source",
"url": "view_vs_slice.csv",
"format": {
"type": "csv"
}
},
"mark": "point",
"encoding": {
"y": {
"field": "timesteps",
"type": "ordinal",
"sort": [4, 16, 64]
},
"x": {
"field": "time_μs",
"type": "quantitative",
"scale": {
"type": "log"
}
},
"column": {
"field": "features",
"type": "quantitative"
},
"row": {
"field": "batch_size",
"type": "quantitative"
},
"color": {
"field": "branch",
"type": "nominal"
},
"shape": {
"field": "device",
"type": "nominal"
},
"opacity": {
"condition": {"param": "device", "value": 1},
"value": 0.2
}
}
}
branch rnn_type device features batch_size timesteps passes time_μs alloc_num alloc_KiB
master LSTM CPU 32 32 4 forward 40.2490005493164 53 88.1399993896484
master LSTM CPU 32 32 4 backward 155.457000732422 608 233.699996948242
master LSTM CPU 32 32 4 forw and back 336.703002929688 1495 405.920013427734
master LSTM CUDA 32 32 4 forward 449.045989990234 1289 70.6999969482422
master LSTM CUDA 32 32 4 backward 1079.0 3075 208.309997558594
master LSTM CUDA 32 32 4 forw and back 1702.0 5005 321.980010986328
master LSTM CPU 32 32 16 forward 177.645004272461 209 364.230010986328
master LSTM CPU 32 32 16 backward 584.898986816406 2300 950.72998046875
master LSTM CPU 32 32 16 forw and back 1219.0 5575 1590.0
master LSTM CUDA 32 32 16 forward 1384.0 5153 282.670013427734
master LSTM CUDA 32 32 16 backward 3566.0 12219 831.169982910156
master LSTM CUDA 32 32 16 forw and back 5574.0 19597 1230.0
master LSTM CPU 32 32 32 forward 360.834991455078 417 732.359985351562
master LSTM CPU 32 32 32 backward 1154.0 4557 1860.0
master LSTM CPU 32 32 32 forw and back 2370.0 11017 3190.0
master LSTM CUDA 32 32 32 forward 2631.0 10305 565.299987792969
master LSTM CUDA 32 32 32 backward 6852.0 24412 1620.0
master LSTM CUDA 32 32 32 forw and back 10746.0 39055 2460.0
master LSTM CPU 32 32 64 forward 733.223999023438 833 1430.0
master LSTM CPU 32 32 64 backward 2329.0 9069 3730.0
master LSTM CPU 32 32 64 forw and back 4713.0 21897 6380.0
master LSTM CUDA 32 32 64 forward 5068.0 20609 1100.0
master LSTM CUDA 32 32 64 backward 13402.0 48796 3240.0
master LSTM CUDA 32 32 64 forw and back 21120.0 77967 4910.0
master LSTM CPU 32 128 4 forward 153.962997436523 53 342.640014648437
master LSTM CPU 32 128 4 backward 273.554992675781 608 722.830017089844
master LSTM CPU 32 128 4 forw and back 597.818969726562 1495 1160.0
master LSTM CUDA 32 128 4 forward 467.039001464844 1297 70.8300018310547
master LSTM CUDA 32 128 4 backward 1109.0 3119 209.0
master LSTM CUDA 32 128 4 forw and back 1761.0 5049 322.670013427734
master LSTM CPU 32 128 16 forward 644.987976074219 209 1380.0
master LSTM CPU 32 128 16 backward 1065.0 2300 2860.0
master LSTM CPU 32 128 16 forw and back 2241.0 5575 4700.0
master LSTM CUDA 32 128 16 forward 1426.0 5185 283.170013427734
master LSTM CUDA 32 128 16 backward 3689.0 12395 833.919982910156
master LSTM CUDA 32 128 16 forw and back 5764.0 19773 1240.0
master LSTM CPU 32 128 32 forward 1302.0 417 2790.0
master LSTM CPU 32 128 32 backward 2168.0 4557 5730.0
master LSTM CPU 32 128 32 forw and back 4473.0 11017 9410.0
master LSTM CUDA 32 128 32 forward 2663.0 10369 566.299987792969
master LSTM CUDA 32 128 32 backward 7051.0 24764 1630.0
master LSTM CUDA 32 128 32 forw and back 10977.0 39407 2460.0
master LSTM CPU 32 128 64 forward 2627.0 833 5590.0
master LSTM CPU 32 128 64 backward 4497.0 9069 11460.0
master LSTM CPU 32 128 64 forw and back 8940.0 21897 18840.0
master LSTM CUDA 32 128 64 forward 5145.0 20737 1110.0
master LSTM CUDA 32 128 64 backward 13790.0 49500 3260.0
master LSTM CUDA 32 128 64 forw and back 21547.0 78671 4920.0
master LSTM CPU 32 512 4 forward 1477.0 64 1320.0
master LSTM CPU 32 512 4 backward 1701.0 636 2600.0
master LSTM CPU 32 512 4 forw and back 3646.0 1546 4180.0
master LSTM CUDA 32 512 4 forward 468.97900390625 1334 71.6600036621094
master LSTM CUDA 32 512 4 backward 1175.0 3147 209.440002441406
master LSTM CUDA 32 512 4 forw and back 1834.0 5146 325.079986572266
master LSTM CPU 32 512 16 forward 5619.0 256 5460.0
master LSTM CPU 32 512 16 backward 7090.0 2412 10510.0
master LSTM CPU 32 512 16 forw and back 14776.0 5782 16990.0
master LSTM CUDA 32 512 16 forward 1435.0 5342 286.619995117187
master LSTM CUDA 32 512 16 backward 3871.0 12519 835.859985351563
master LSTM CUDA 32 512 16 forw and back 6144.0 20134 1240.0
master LSTM CPU 32 512 32 forward 12595.0 512 10980.0
master LSTM CPU 32 512 32 backward 14100.0 4781 21060.0
master LSTM CPU 32 512 32 forw and back 29444.0 11432 34070.0
master LSTM CUDA 32 512 32 forward 2721.0 10686 573.25
master LSTM CUDA 32 512 32 backward 7438.0 25016 1630.0
master LSTM CUDA 32 512 32 forw and back 11761.0 40120 2480.0
master LSTM CPU 32 512 64 forward 25441.0 1024 22030.0
master LSTM CPU 32 512 64 backward 28548.0 9517 42150.0
master LSTM CPU 32 512 64 forw and back 58672.0 22728 68230.0
master LSTM CUDA 32 512 64 forward 5208.0 21374 1120.0
master LSTM CUDA 32 512 64 backward 14488.0 50008 3260.0
master LSTM CUDA 32 512 64 forw and back 23170.0 80088 4940.0
master LSTM CPU 128 32 4 forward 53.7029991149902 53 88.1399993896484
master LSTM CPU 128 32 4 backward 196.535995483398 608 413.700012207031
master LSTM CPU 128 32 4 forw and back 399.234985351562 1495 537.919982910156
master LSTM CUDA 128 32 4 forward 484.81298828125 1289 70.6999969482422
master LSTM CUDA 128 32 4 backward 1125.0 3075 208.309997558594
master LSTM CUDA 128 32 4 forw and back 1769.0 5005 321.980010986328
master LSTM CPU 128 32 16 forward 232.432998657227 209 364.230010986328
master LSTM CPU 128 32 16 backward 741.138000488281 2300 1670.0
master LSTM CPU 128 32 16 forw and back 1420.0 5575 2140.0
master LSTM CUDA 128 32 16 forward 1469.0 5153 282.670013427734
master LSTM CUDA 128 32 16 backward 3697.0 12219 831.169982910156
master LSTM CUDA 128 32 16 forw and back 5811.0 19597 1230.0
master LSTM CPU 128 32 32 forward 475.834991455078 417 732.359985351562
master LSTM CPU 128 32 32 backward 1496.0 4557 3350.0
master LSTM CPU 128 32 32 forw and back 2802.0 11017 4300.0
master LSTM CUDA 128 32 32 forward 2735.0 10305 565.299987792969
master LSTM CUDA 128 32 32 backward 7067.0 24412 1620.0
master LSTM CUDA 128 32 32 forw and back 11125.0 39055 2460.0
master LSTM CPU 128 32 64 forward 959.431030273438 833 1430.0
master LSTM CPU 128 32 64 backward 3159.0 9069 6720.0
master LSTM CPU 128 32 64 forw and back 5664.0 21897 8620.0
master LSTM CUDA 128 32 64 forward 5234.0 20609 1100.0
master LSTM CUDA 128 32 64 backward 13496.0 48796 3240.0
master LSTM CUDA 128 32 64 forw and back 21335.0 77967 4910.0
master LSTM CPU 128 128 4 forward 311.561004638672 53 342.640014648437
master LSTM CPU 128 128 4 backward 1003.0 616 1160.0
master LSTM CPU 128 128 4 forw and back 1899.0 1499 1430.0
master LSTM CUDA 128 128 4 forward 476.438995361328 1297 70.8300018310547
master LSTM CUDA 128 128 4 backward 1125.0 3119 209.0
master LSTM CUDA 128 128 4 forw and back 1769.0 5049 322.670013427734
master LSTM CPU 128 128 16 forward 2236.0 209 1380.0
master LSTM CPU 128 128 16 backward 4116.0 2332 4720.0
master LSTM CPU 128 128 16 forw and back 7460.0 5591 5810.0
master LSTM CUDA 128 128 16 forward 1438.0 5185 283.170013427734
master LSTM CUDA 128 128 16 backward 3694.0 12395 833.919982910156
master LSTM CUDA 128 128 16 forw and back 5800.0 19773 1240.0
master LSTM CPU 128 128 32 forward 4504.0 417 2790.0
master LSTM CPU 128 128 32 backward 8002.0 4621 9460.0
master LSTM CPU 128 128 32 forw and back 14503.0 11049 11650.0
master LSTM CUDA 128 128 32 forward 2700.0 10369 566.299987792969
master LSTM CUDA 128 128 32 backward 7140.0 24764 1630.0
master LSTM CUDA 128 128 32 forw and back 11057.0 39407 2460.0
master LSTM CPU 128 128 64 forward 9043.0 833 5590.0
master LSTM CPU 128 128 64 backward 16704.0 9197 18940.0
master LSTM CPU 128 128 64 forw and back 29730.0 21961 23320.0
master LSTM CUDA 128 128 64 forward 5205.0 20737 1110.0
master LSTM CUDA 128 128 64 backward 14014.0 49500 3260.0
master LSTM CUDA 128 128 64 forw and back 21858.0 78671 4920.0
master LSTM CPU 128 512 4 forward 1537.0 64 1320.0
master LSTM CPU 128 512 4 backward 2000.0 636 4180.0
master LSTM CPU 128 512 4 forw and back 3908.0 1546 5010.0
master LSTM CUDA 128 512 4 forward 498.242004394531 1334 71.6600036621094
master LSTM CUDA 128 512 4 backward 1193.0 3147 209.440002441406
master LSTM CUDA 128 512 4 forw and back 1856.0 5146 325.079986572266
master LSTM CPU 128 512 16 forward 6646.0 256 5460.0
master LSTM CPU 128 512 16 backward 8319.0 2412 16880.0
master LSTM CPU 128 512 16 forw and back 15881.0 5782 20350.0
master LSTM CUDA 128 512 16 forward 1488.0 5342 286.619995117187
master LSTM CUDA 128 512 16 backward 3926.0 12519 835.859985351563
master LSTM CUDA 128 512 16 forw and back 6031.0 20134 1240.0
master LSTM CPU 128 512 32 forward 13529.0 512 10980.0
master LSTM CPU 128 512 32 backward 16747.0 4781 33800.0
master LSTM CPU 128 512 32 forw and back 31685.0 11432 40810.0
master LSTM CUDA 128 512 32 forward 2830.0 10686 573.25
master LSTM CUDA 128 512 32 backward 7573.0 25016 1630.0
master LSTM CUDA 128 512 32 forw and back 11650.0 40120 2480.0
master LSTM CPU 128 512 64 forward 27174.0 1024 22030.0
master LSTM CPU 128 512 64 backward 33567.0 9517 67640.0
master LSTM CPU 128 512 64 forw and back 63216.0 22728 81710.0
master LSTM CUDA 128 512 64 forward 5410.0 21374 1120.0
master LSTM CUDA 128 512 64 backward 14782.0 50008 3260.0
master LSTM CUDA 128 512 64 forw and back 22877.0 80088 4940.0
master LSTM CPU 512 32 4 forward 296.654998779297 53 88.1399993896484
master LSTM CPU 512 32 4 backward 902.289978027344 623 1110.0
master LSTM CPU 512 32 4 forw and back 1507.0 1506 1040.0
master LSTM CUDA 512 32 4 forward 495.875 1309 71.0199966430664
master LSTM CUDA 512 32 4 backward 1152.0 3107 208.809997558594
master LSTM CUDA 512 32 4 forw and back 1801.0 5057 322.799987792969
master LSTM CPU 512 32 16 forward 1234.0 209 364.230010986328
master LSTM CPU 512 32 16 backward 3671.0 2363 4620.0
master LSTM CPU 512 32 16 forw and back 5887.0 5622 4340.0
master LSTM CUDA 512 32 16 forward 1492.0 5233 283.920013427734
master LSTM CUDA 512 32 16 backward 3738.0 12347 833.169982910156
master LSTM CUDA 512 32 16 forw and back 5898.0 19805 1240.0
master LSTM CPU 512 32 32 forward 2483.0 417 732.359985351562
master LSTM CPU 512 32 32 backward 7368.0 4684 9290.0
master LSTM CPU 512 32 32 forw and back 11759.0 11112 8750.0
master LSTM CUDA 512 32 32 forward 2761.0 10465 567.799987792969
master LSTM CUDA 512 32 32 backward 7149.0 24668 1630.0
master LSTM CUDA 512 32 32 forw and back 11274.0 39471 2460.0
master LSTM CPU 512 32 64 forward 5011.0 833 1430.0
master LSTM CPU 512 32 64 backward 12625.0 9324 18650.0
master LSTM CPU 512 32 64 forw and back 23286.0 22088 17560.0
master LSTM CUDA 512 32 64 forward 5330.0 20929 1110.0
master LSTM CUDA 512 32 64 backward 13971.0 49308 3250.0
master LSTM CUDA 512 32 64 forw and back 22006.0 78799 4920.0
master LSTM CPU 512 128 4 forward 598.161010742188 53 342.640014648437
master LSTM CPU 512 128 4 backward 1382.0 623 2990.0
master LSTM CPU 512 128 4 forw and back 2212.0 1506 2510.0
master LSTM CUDA 512 128 4 forward 529.643981933594 1317 71.1399993896484
master LSTM CUDA 512 128 4 backward 1173.0 3151 209.5
master LSTM CUDA 512 128 4 forw and back 1868.0 5101 323.480010986328
master LSTM CPU 512 128 16 forward 2532.0 209 1380.0
master LSTM CPU 512 128 16 backward 5782.0 2363 12170.0
master LSTM CPU 512 128 16 forw and back 8975.0 5622 10260.0
master LSTM CUDA 512 128 16 forward 1601.0 5265 284.420013427734
master LSTM CUDA 512 128 16 backward 3813.0 12523 835.919982910156
master LSTM CUDA 512 128 16 forw and back 6066.0 19981 1240.0
master LSTM CPU 512 128 32 forward 5206.0 417 2790.0
master LSTM CPU 512 128 32 backward 9077.0 4684 24410.0
master LSTM CPU 512 128 32 forw and back 17836.0 11112 20590.0
master LSTM CUDA 512 128 32 forward 2990.0 10529 568.799987792969
master LSTM CUDA 512 128 32 backward 7328.0 25020 1630.0
master LSTM CUDA 512 128 32 forw and back 11564.0 39823 2470.0
master LSTM CPU 512 128 64 forward 10575.0 833 5590.0
master LSTM CPU 512 128 64 backward 23239.0 9324 48880.0
master LSTM CPU 512 128 64 forw and back 35565.0 22088 41270.0
master LSTM CUDA 512 128 64 forward 5804.0 21057 1110.0
master LSTM CUDA 512 128 64 backward 14272.0 50012 3260.0
master LSTM CUDA 512 128 64 forw and back 22629.0 79503 4930.0
master LSTM CPU 512 512 4 forward 1775.0 64 1320.0
master LSTM CPU 512 512 4 backward 3575.0 643 10510.0
master LSTM CPU 512 512 4 forw and back 5246.0 1553 8340.0
master LSTM CUDA 512 512 4 forward 524.591979980469 1354 71.9700012207031
master LSTM CUDA 512 512 4 backward 1235.0 3179 209.940002441406
master LSTM CUDA 512 512 4 forw and back 1906.0 5198 325.890014648437
master LSTM CPU 512 512 16 forward 7986.0 256 5460.0
master LSTM CPU 512 512 16 backward 14480.0 2443 42330.0
master LSTM CPU 512 512 16 forw and back 20785.0 5813 33800.0
master LSTM CUDA 512 512 16 forward 1586.0 5422 287.880004882812
master LSTM CUDA 512 512 16 backward 4138.0 12647 837.859985351563
master LSTM CUDA 512 512 16 forw and back 6273.0 20342 1250.0
master LSTM CPU 512 512 32 forward 16118.0 512 10980.0
master LSTM CPU 512 512 32 backward 29159.0 4844 84750.0
master LSTM CPU 512 512 32 forw and back 41383.0 11495 67750.0
master LSTM CUDA 512 512 32 forward 3017.0 10846 575.75
master LSTM CUDA 512 512 32 backward 7970.0 25272 1640.0
master LSTM CUDA 512 512 32 forw and back 12031.0 40536 2480.0
master LSTM CPU 512 512 64 forward 32323.001953125 1024 22030.0
master LSTM CPU 512 512 64 backward 58327.0 9644 169590.0
master LSTM CPU 512 512 64 forw and back 82442.0 22855 135660.0
master LSTM CUDA 512 512 64 forward 5773.0 21694 1120.0
master LSTM CUDA 512 512 64 backward 16018.9990234375 50520 3270.0
master LSTM CUDA 512 512 64 forw and back 23525.0 80920 4960.0
master GRU CPU 32 32 4 forward 37.0559997558594 49 61.7000007629395
master GRU CPU 32 32 4 backward 171.009002685547 844 246.169998168945
master GRU CPU 32 32 4 forw and back 350.778991699219 1737 428.230010986328
master GRU CUDA 32 32 4 forward 486.434997558594 1429 82.1999969482422
master GRU CUDA 32 32 4 backward 1490.0 3466 213.529998779297
master GRU CUDA 32 32 4 forw and back 2230.0 5809 364.190002441406
master GRU CPU 32 32 16 forward 164.147994995117 193 264.299987792969
master GRU CPU 32 32 16 backward 672.080017089844 3149 1000.0
master GRU CPU 32 32 16 forw and back 1278.0 6479 1720.0
master GRU CUDA 32 32 16 forward 1452.0 5713 328.670013427734
master GRU CUDA 32 32 16 backward 4948.0 13607 844.5
master GRU CUDA 32 32 16 forw and back 7461.0 22671 1390.0
master GRU CPU 32 32 32 forward 338.151000976562 385 534.419982910156
master GRU CPU 32 32 32 backward 1315.0 6221 2020.0
master GRU CPU 32 32 32 forw and back 2501.0 12799 3440.0
master GRU CUDA 32 32 32 forward 2712.0 11425 657.299987792969
master GRU CUDA 32 32 32 backward 9540.0 27127 1650.0
master GRU CUDA 32 32 32 forw and back 14249.0 45151 2770.0
master GRU CPU 32 32 64 forward 682.411987304688 769 1050.0
master GRU CPU 32 32 64 backward 2647.0 12365 4060.0
master GRU CPU 32 32 64 forw and back 4982.0 25439 6900.0
master GRU CUDA 32 32 64 forward 5276.0 22849 1280.0
master GRU CUDA 32 32 64 backward 18634.0 54167 3290.0
master GRU CUDA 32 32 64 forw and back 27950.0 90111 5540.0
master GRU CPU 32 128 4 forward 145.167007446289 49 238.020004272461
master GRU CPU 32 128 4 backward 298.838012695312 844 747.299987792969
master GRU CPU 32 128 4 forw and back 595.940979003906 1737 1150.0
master GRU CUDA 32 128 4 forward 475.029998779297 1437 82.3300018310547
master GRU CUDA 32 128 4 backward 1497.0 3502 214.089996337891
master GRU CUDA 32 128 4 forw and back 2254.0 5845 364.75
master GRU CPU 32 128 16 forward 616.849975585938 193 1000.0
master GRU CPU 32 128 16 backward 1153.0 3149 3090.0
master GRU CPU 32 128 16 forw and back 2215.0 6479 4830.0
master GRU CUDA 32 128 16 forward 1444.0 5745 329.170013427734
master GRU CUDA 32 128 16 backward 5056.0 13751 846.75
master GRU CUDA 32 128 16 forw and back 7498.0 22815 1390.0
master GRU CPU 32 128 32 forward 1242.0 385 2020.0
master GRU CPU 32 128 32 backward 2346.0 6221 6230.0
master GRU CPU 32 128 32 forw and back 4434.0 12799 9730.0
master GRU CUDA 32 128 32 forward 2699.0 11489 658.299987792969
master GRU CUDA 32 128 32 backward 9703.0 27415 1650.0
master GRU CUDA 32 128 32 forw and back 14495.0 45439 2780.0
master GRU CPU 32 128 64 forward 2507.0 769 4070.00024414062
master GRU CPU 32 128 64 backward 4859.0 12365 12510.0
master GRU CPU 32 128 64 forw and back 8842.0 25439 19540.0
master GRU CUDA 32 128 64 forward 5261.0 22977 1290.0
master GRU CUDA 32 128 64 backward 18872.0 54743 3300.0
master GRU CUDA 32 128 64 forw and back 28257.0 90687 5550.0
master GRU CPU 32 512 4 forward 1406.0 56 933.469970703125
master GRU CPU 32 512 4 backward 1806.0 880 2670.0
master GRU CPU 32 512 4 forw and back 3487.0 1788 4060.0
master GRU CUDA 32 512 4 forward 462.93701171875 1458 82.6600036621094
master GRU CUDA 32 512 4 backward 1541.0 3594 217.839996337891
master GRU CUDA 32 512 4 forw and back 2293.0 5998 370.220001220703
master GRU CPU 32 512 16 forward 5263.0 224 3930.0
master GRU CPU 32 512 16 backward 7605.0 3305 11350.0
master GRU CPU 32 512 16 forw and back 14322.0 6698 17150.0
master GRU CUDA 32 512 16 forward 1437.0 5838 330.619995117187
master GRU CUDA 32 512 16 backward 5297.0 14131 861.940002441406
master GRU CUDA 32 512 16 forw and back 7771.0 23400 1410.0
master GRU CPU 32 512 32 forward 11747.0 448 7950.0
master GRU CPU 32 512 32 backward 15211.0 6537 22910.0
master GRU CPU 32 512 32 forw and back 28691.0 13242 34600.0
master GRU CUDA 32 512 32 forward 2719.0 11678 661.25
master GRU CUDA 32 512 32 backward 10270.0 28179 1680.0
master GRU CUDA 32 512 32 forw and back 15028.0 46600 2820.0
master GRU CPU 32 512 64 forward 23613.0 896 15990.0
master GRU CPU 32 512 64 backward 30489.0 13001 46050.0
master GRU CPU 32 512 64 forw and back 56960.0 26330 69500.0
master GRU CUDA 32 512 64 forward 5290.0 23358 1290.0
master GRU CUDA 32 512 64 backward 20239.0 56275 3360.0
master GRU CUDA 32 512 64 forw and back 30735.0 93000 5620.0
master GRU CPU 128 32 4 forward 48.1609992980957 49 61.7000007629395
master GRU CPU 128 32 4 backward 215.723007202148 844 405.170013427734
master GRU CPU 128 32 4 forw and back 408.123992919922 1737 539.22998046875
master GRU CUDA 128 32 4 forward 482.053009033203 1429 82.1999969482422
master GRU CUDA 128 32 4 backward 1496.0 3466 213.529998779297
master GRU CUDA 128 32 4 forw and back 2217.0 5809 364.190002441406
master GRU CPU 128 32 16 forward 209.447998046875 193 264.299987792969
master GRU CPU 128 32 16 backward 795.439025878906 3149 1650.0
master GRU CPU 128 32 16 forw and back 1444.0 6479 2180.0
master GRU CUDA 128 32 16 forward 1489.0 5713 328.670013427734
master GRU CUDA 128 32 16 backward 4985.0 13607 844.5
master GRU CUDA 128 32 16 forw and back 7435.0 22671 1390.0
master GRU CPU 128 32 32 forward 426.944000244141 385 534.419982910156
master GRU CPU 128 32 32 backward 1595.0 6221 3330.0
master GRU CPU 128 32 32 forw and back 2840.0 12799 4370.0
master GRU CUDA 128 32 32 forward 2745.0 11425 657.299987792969
master GRU CUDA 128 32 32 backward 9446.0 27127 1650.0
master GRU CUDA 128 32 32 forw and back 14175.0 45151 2770.0
master GRU CPU 128 32 64 forward 859.223022460937 769 1050.0
master GRU CPU 128 32 64 backward 3385.0 12365 6680.0
master GRU CPU 128 32 64 forw and back 5770.0 25439 8770.0
master GRU CUDA 128 32 64 forward 5302.0 22849 1280.0
master GRU CUDA 128 32 64 backward 18317.0 54167 3290.0
master GRU CUDA 128 32 64 forw and back 27563.0 90111 5540.0
master GRU CPU 128 128 4 forward 494.161987304687 49 238.020004272461
master GRU CPU 128 128 4 backward 1078.0 852 1170.0
master GRU CPU 128 128 4 forw and back 1902.0 1741 1400.0
master GRU CUDA 128 128 4 forward 465.380004882812 1437 82.3300018310547
master GRU CUDA 128 128 4 backward 1477.0 3502 214.089996337891
master GRU CUDA 128 128 4 forw and back 2239.0 5845 364.75
master GRU CPU 128 128 16 forward 2125.0 193 1000.0
master GRU CPU 128 128 16 backward 4355.0 3181 4860.0
master GRU CPU 128 128 16 forw and back 7485.0 6495 5850.0
master GRU CUDA 128 128 16 forward 1477.0 5745 329.170013427734
master GRU CUDA 128 128 16 backward 5129.0 13751 846.75
master GRU CUDA 128 128 16 forw and back 7632.0 22815 1390.0
master GRU CPU 128 128 32 forward 4230.0 385 2020.0
master GRU CPU 128 128 32 backward 8788.0 6285 9780.0
master GRU CPU 128 128 32 forw and back 14835.0 12831 11780.0
master GRU CUDA 128 128 32 forward 2758.0 11489 658.299987792969
master GRU CUDA 128 128 32 backward 9839.0 27415 1650.0
master GRU CUDA 128 128 32 forw and back 14647.0 45439 2780.0
master GRU CPU 128 128 64 forward 8546.0 769 4070.00024414062
master GRU CPU 128 128 64 backward 17662.0 12493 19620.0
master GRU CPU 128 128 64 forw and back 30030.0 25503 23650.0
master GRU CUDA 128 128 64 forward 5346.0 22977 1290.0
master GRU CUDA 128 128 64 backward 19017.0 54743 3300.0
master GRU CUDA 128 128 64 forw and back 28545.0 90687 5550.0
master GRU CPU 128 512 4 forward 1467.0 56 933.469970703125
master GRU CPU 128 512 4 backward 2078.0 880 4230.0
master GRU CPU 128 512 4 forw and back 3707.0 1788 4870.0
master GRU CUDA 128 512 4 forward 471.027008056641 1458 82.6600036621094
master GRU CUDA 128 512 4 backward 1575.0 3594 217.839996337891
master GRU CUDA 128 512 4 forw and back 2332.0 5998 370.220001220703
master GRU CPU 128 512 16 forward 6161.0 224 3930.0
master GRU CPU 128 512 16 backward 8740.0 3305 17620.0
master GRU CPU 128 512 16 forw and back 15252.0 6698 20420.0
master GRU CUDA 128 512 16 forward 1481.0 5838 330.619995117187
master GRU CUDA 128 512 16 backward 5411.0 14131 861.940002441406
master GRU CUDA 128 512 16 forw and back 7896.0 23400 1410.0
master GRU CPU 128 512 32 forward 10098.0 448 7950.0
master GRU CPU 128 512 32 backward 17682.0 6537 35470.0
master GRU CPU 128 512 32 forw and back 30554.0 13242 41150.0
master GRU CUDA 128 512 32 forward 2800.0 11678 661.25
master GRU CUDA 128 512 32 backward 10446.0 28179 1680.0
master GRU CUDA 128 512 32 forw and back 15398.0 46600 2820.0
master GRU CPU 128 512 64 forward 25142.0 896 15990.0
master GRU CPU 128 512 64 backward 35435.0 13001 71160.0
master GRU CPU 128 512 64 forw and back 60727.0 26330 82620.0
master GRU CUDA 128 512 64 forward 5479.0 23358 1290.0
master GRU CUDA 128 512 64 backward 20468.0 56275 3360.0
master GRU CUDA 128 512 64 forw and back 30247.0 93000 5620.0
master GRU CPU 512 32 4 forward 311.080993652344 49 61.7000007629395
master GRU CPU 512 32 4 backward 937.929992675781 859 1020.0
master GRU CPU 512 32 4 forw and back 1183.0 1748 982.380004882813
master GRU CUDA 512 32 4 forward 500.746002197266 1449 82.5199966430664
master GRU CUDA 512 32 4 backward 1524.0 3498 214.029998779297
master GRU CUDA 512 32 4 forw and back 2272.0 5861 365.0
master GRU CPU 512 32 16 forward 1297.0 193 264.299987792969
master GRU CPU 512 32 16 backward 3842.0 3212 4240.0
master GRU CPU 512 32 16 forw and back 6129.0 6526 4010.00024414062
master GRU CUDA 512 32 16 forward 1538.0 5793 329.920013427734
master GRU CUDA 512 32 16 backward 5103.0 13735 846.5
master GRU CUDA 512 32 16 forw and back 7769.0 22879 1400.0
master GRU CPU 512 32 32 forward 2623.0 385 534.419982910156
master GRU CPU 512 32 32 backward 7902.0 6348 8530.0
master GRU CPU 512 32 32 forw and back 12426.0 12894 8080.0
master GRU CUDA 512 32 32 forward 2858.0 11585 659.799987792969
master GRU CUDA 512 32 32 backward 9831.0 27383 1650.0
master GRU CUDA 512 32 32 forw and back 14704.0 45567 2780.0
master GRU CPU 512 32 64 forward 5302.0 769 1050.0
master GRU CPU 512 32 64 backward 15853.0 12620 17120.0
master GRU CPU 512 32 64 forw and back 24895.0 25630 16219.9990234375
master GRU CUDA 512 32 64 forward 5513.0 23169 1290.0
master GRU CUDA 512 32 64 backward 18977.0 54679 3300.0
master GRU CUDA 512 32 64 forw and back 28441.0 90943 5550.0
master GRU CPU 512 128 4 forward 587.296020507812 49 238.020004272461
master GRU CPU 512 128 4 backward 1421.0 859 2910.0
master GRU CPU 512 128 4 forw and back 2214.0 1748 2400.0
master GRU CUDA 512 128 4 forward 490.171997070312 1457 82.6399993896484
master GRU CUDA 512 128 4 backward 1532.0 3534 214.589996337891
master GRU CUDA 512 128 4 forw and back 2317.0 5897 365.559997558594
master GRU CPU 512 128 16 forward 2443.0 193 1000.0
master GRU CPU 512 128 16 backward 6062.0 3212 11940.0
master GRU CPU 512 128 16 forw and back 8953.0 6526 9940.0
master GRU CUDA 512 128 16 forward 1482.0 5825 330.420013427734
master GRU CUDA 512 128 16 backward 5141.0 13879 848.75
master GRU CUDA 512 128 16 forw and back 7694.0 23023 1400.0
master GRU CPU 512 128 32 forward 4997.0 385 2020.0
master GRU CPU 512 128 32 backward 12083.0 6348 23990.0
master GRU CPU 512 128 32 forw and back 17802.0 12894 19990.0
master GRU CUDA 512 128 32 forward 2768.0 11649 660.799987792969
master GRU CUDA 512 128 32 backward 9899.0 27671 1650.0
master GRU CUDA 512 128 32 forw and back 14722.0 45855 2790.0
master GRU CPU 512 128 64 forward 10132.0 769 4070.00024414062
master GRU CPU 512 128 64 backward 24323.0 12620 48070.0
master GRU CPU 512 128 64 forw and back 35536.0 25630 40100.0
master GRU CUDA 512 128 64 forward 5351.0 23297 1290.0
master GRU CUDA 512 128 64 backward 19270.0 55255 3310.0
master GRU CUDA 512 128 64 forw and back 28835.0 91519 5560.0
master GRU CPU 512 512 4 forward 1769.0 56 933.469970703125
master GRU CPU 512 512 4 backward 3704.0 887 10480.0
master GRU CPU 512 512 4 forw and back 5138.0 1795 8109.99951171875
master GRU CUDA 512 512 4 forward 522.494995117187 1478 82.9700012207031
master GRU CUDA 512 512 4 backward 1630.0 3626 218.339996337891
master GRU CUDA 512 512 4 forw and back 2422.0 6050 371.029998779297
master GRU CPU 512 512 16 forward 7603.0 224 3930.0
master GRU CPU 512 512 16 backward 14938.0 3336 42710.0
master GRU CPU 512 512 16 forw and back 20314.0 6729 33500.0
master GRU CUDA 512 512 16 forward 1633.0 5918 331.880004882812
master GRU CUDA 512 512 16 backward 5625.0 14259 863.940002441406
master GRU CUDA 512 512 16 forw and back 8189.0 23608 1420.0
master GRU CPU 512 512 32 forward 15317.0 448 7950.0
master GRU CPU 512 512 32 backward 30130.0 6600 85680.0
master GRU CPU 512 512 32 forw and back 40710.0 13305 67360.0
master GRU CUDA 512 512 32 forward 3070.0 11838 663.75
master GRU CUDA 512 512 32 backward 10983.0 28435 1680.0
master GRU CUDA 512 512 32 forw and back 15912.0 47016 2820.0
master GRU CPU 512 512 64 forward 30749.0 896 15990.0
master GRU CPU 512 512 64 backward 60205.0 13128 171620.0
master GRU CPU 512 512 64 forw and back 80993.0 26457 135070.0
master GRU CUDA 512 512 64 forward 6003.0 23678 1300.0
master GRU CUDA 512 512 64 backward 22428.0 56787 3370.0
master GRU CUDA 512 512 64 forw and back 31418.0 93832 5640.0
pr1761 LSTM CPU 32 32 4 forward 36.5110015869141 37 71.1399993896484
pr1761 LSTM CPU 32 32 4 backward 157.996994018555 608 233.699996948242
pr1761 LSTM CPU 32 32 4 forw and back 328.358001708984 1495 390.670013427734
pr1761 LSTM CUDA 32 32 4 forward 331.403015136719 809 56.4500007629395
pr1761 LSTM CUDA 32 32 4 backward 1063.0 3075 208.309997558594
pr1761 LSTM CUDA 32 32 4 forw and back 1585.0 4541 309.480010986328
pr1761 LSTM CPU 32 32 16 forward 163.046997070312 145 296.230010986328
pr1761 LSTM CPU 32 32 16 backward 588.481018066406 2300 950.72998046875
pr1761 LSTM CPU 32 32 16 forw and back 1193.0 5575 1530.0
pr1761 LSTM CUDA 32 32 16 forward 997.359985351562 3233 225.669998168945
pr1761 LSTM CUDA 32 32 16 backward 3549.0 12219 831.169982910156
pr1761 LSTM CUDA 32 32 16 forw and back 5160.0 17741 1180.0
pr1761 LSTM CPU 32 32 32 forward 335.326995849609 289 596.359985351562
pr1761 LSTM CPU 32 32 32 backward 1162.0 4557 1860.0
pr1761 LSTM CPU 32 32 32 forw and back 2329.0 11017 3070.0
pr1761 LSTM CUDA 32 32 32 forward 1884.0 6465 451.299987792969
pr1761 LSTM CUDA 32 32 32 backward 6835.0 24412 1620.0
pr1761 LSTM CUDA 32 32 32 forw and back 9937.0 35343 2360.0
pr1761 LSTM CPU 32 32 64 forward 677.27099609375 577 1170.0
pr1761 LSTM CPU 32 32 64 backward 2348.0 9069 3730.0
pr1761 LSTM CPU 32 32 64 forw and back 4628.0 21897 6140.0
pr1761 LSTM CUDA 32 32 64 forward 3638.0 12929 902.559997558594
pr1761 LSTM CUDA 32 32 64 backward 13431.0 48796 3240.0
pr1761 LSTM CUDA 32 32 64 forw and back 19508.0 70543 4710.0
pr1761 LSTM CPU 32 128 4 forward 139.940002441406 37 276.640014648437
pr1761 LSTM CPU 32 128 4 backward 275.686004638672 608 722.830017089844
pr1761 LSTM CPU 32 128 4 forw and back 582.452026367187 1495 1100.0
pr1761 LSTM CUDA 32 128 4 forward 343.43701171875 817 56.5800018310547
pr1761 LSTM CUDA 32 128 4 backward 1101.0 3119 209.0
pr1761 LSTM CUDA 32 128 4 forw and back 1629.0 4585 310.170013427734
pr1761 LSTM CPU 32 128 16 forward 588.31298828125 145 1130.0
pr1761 LSTM CPU 32 128 16 backward 1068.0 2300 2860.0
pr1761 LSTM CPU 32 128 16 forw and back 2170.0 5575 4440.0
pr1761 LSTM CUDA 32 128 16 forward 1043.0 3265 226.169998168945
pr1761 LSTM CUDA 32 128 16 backward 3662.0 12395 833.919982910156
pr1761 LSTM CUDA 32 128 16 forw and back 5349.0 17917 1190.0
pr1761 LSTM CPU 32 128 32 forward 1190.0 289 2270.0
pr1761 LSTM CPU 32 128 32 backward 2180.0 4557 5730.0
pr1761 LSTM CPU 32 128 32 forw and back 4349.0 11017 8910.0
pr1761 LSTM CUDA 32 128 32 forward 1917.0 6529 452.299987792969
pr1761 LSTM CUDA 32 128 32 backward 6973.0 24764 1630.0
pr1761 LSTM CUDA 32 128 32 forw and back 10145.0 35695 2360.0
pr1761 LSTM CPU 32 128 64 forward 2394.0 577 4560.0
pr1761 LSTM CPU 32 128 64 backward 4499.0 9069 11460.0
pr1761 LSTM CPU 32 128 64 forw and back 8673.0 21897 17840.0
pr1761 LSTM CUDA 32 128 64 forward 3725.0 13057 904.559997558594
pr1761 LSTM CUDA 32 128 64 backward 13667.0 49500 3260.0
pr1761 LSTM CUDA 32 128 64 forw and back 19937.0 71247 4720.0
pr1761 LSTM CPU 32 512 4 forward 1348.0 48 1070.0
pr1761 LSTM CPU 32 512 4 backward 1674.0 636 2600.0
pr1761 LSTM CPU 32 512 4 forw and back 3579.0 1546 3930.0
pr1761 LSTM CUDA 32 512 4 forward 349.662994384766 838 56.9099998474121
pr1761 LSTM CUDA 32 512 4 backward 1156.0 3147 209.440002441406
pr1761 LSTM CUDA 32 512 4 forw and back 1698.0 4666 312.079986572266
pr1761 LSTM CPU 32 512 16 forward 5774.0 192 4450.0
pr1761 LSTM CPU 32 512 16 backward 7088.0 2412 10510.0
pr1761 LSTM CPU 32 512 16 forw and back 14627.0 5782 15990.0
pr1761 LSTM CUDA 32 512 16 forward 1073.0 3358 227.619995117188
pr1761 LSTM CUDA 32 512 16 backward 3879.0 12519 835.859985351563
pr1761 LSTM CUDA 32 512 16 forw and back 5569.0 18214 1190.0
pr1761 LSTM CPU 32 512 32 forward 9449.0 384 8970.0
pr1761 LSTM CPU 32 512 32 backward 14147.0 4781 21060.0
pr1761 LSTM CPU 32 512 32 forw and back 28861.0 11432 32070.0
pr1761 LSTM CUDA 32 512 32 forward 1981.0 6718 455.25
pr1761 LSTM CUDA 32 512 32 backward 7394.0 25016 1630.0
pr1761 LSTM CUDA 32 512 32 forw and back 10652.0 36280 2370.0
pr1761 LSTM CPU 32 512 64 forward 23437.0 768 17990.0
pr1761 LSTM CPU 32 512 64 backward 28574.0 9517 42150.0
pr1761 LSTM CPU 32 512 64 forw and back 57866.0 22728 64220.0
pr1761 LSTM CUDA 32 512 64 forward 3811.0 13438 910.52001953125
pr1761 LSTM CUDA 32 512 64 backward 14319.0 50008 3260.0
pr1761 LSTM CUDA 32 512 64 forw and back 20900.0 72408 4740.0
pr1761 LSTM CPU 128 32 4 forward 50.7449989318848 37 71.1399993896484
pr1761 LSTM CPU 128 32 4 backward 199.522003173828 608 413.700012207031
pr1761 LSTM CPU 128 32 4 forw and back 396.783996582031 1495 522.669982910156
pr1761 LSTM CUDA 128 32 4 forward 364.985992431641 809 56.4500007629395
pr1761 LSTM CUDA 128 32 4 backward 1135.0 3075 208.309997558594
pr1761 LSTM CUDA 128 32 4 forw and back 1676.0 4541 309.480010986328
pr1761 LSTM CPU 128 32 16 forward 218.906005859375 145 296.230010986328
pr1761 LSTM CPU 128 32 16 backward 755.577026367188 2300 1670.0
pr1761 LSTM CPU 128 32 16 forw and back 1420.0 5575 2080.0
pr1761 LSTM CUDA 128 32 16 forward 1058.0 3233 225.669998168945
pr1761 LSTM CUDA 128 32 16 backward 3679.0 12219 831.169982910156
pr1761 LSTM CUDA 128 32 16 forw and back 5374.0 17741 1180.0
pr1761 LSTM CPU 128 32 32 forward 447.688995361328 289 596.359985351562
pr1761 LSTM CPU 128 32 32 backward 1522.0 4557 3350.0
pr1761 LSTM CPU 128 32 32 forw and back 2791.0 11017 4180.0
pr1761 LSTM CUDA 128 32 32 forward 1960.0 6465 451.299987792969
pr1761 LSTM CUDA 128 32 32 backward 7010.0 24412 1620.0
pr1761 LSTM CUDA 128 32 32 forw and back 10229.0 35343 2360.0
pr1761 LSTM CPU 128 32 64 forward 906.68798828125 577 1170.0
pr1761 LSTM CPU 128 32 64 backward 3214.0 9069 6720.0
pr1761 LSTM CPU 128 32 64 forw and back 5648.0 21897 8380.0
pr1761 LSTM CUDA 128 32 64 forward 3743.0 12929 902.559997558594
pr1761 LSTM CUDA 128 32 64 backward 13742.0 48796 3240.0
pr1761 LSTM CUDA 128 32 64 forw and back 19897.0 70543 4710.0
pr1761 LSTM CPU 128 128 4 forward 493.347991943359 37 276.640014648437
pr1761 LSTM CPU 128 128 4 backward 996.973999023438 616 1160.0
pr1761 LSTM CPU 128 128 4 forw and back 1866.0 1499 1360.0
pr1761 LSTM CUDA 128 128 4 forward 357.843994140625 817 56.5800018310547
pr1761 LSTM CUDA 128 128 4 backward 1130.0 3119 209.0
pr1761 LSTM CUDA 128 128 4 forw and back 1680.0 4585 310.170013427734
pr1761 LSTM CPU 128 128 16 forward 2087.0 145 1130.0
pr1761 LSTM CPU 128 128 16 backward 4089.00024414062 2332 4720.0
pr1761 LSTM CPU 128 128 16 forw and back 7342.0 5591 5560.0
pr1761 LSTM CUDA 128 128 16 forward 1053.0 3265 226.169998168945
pr1761 LSTM CUDA 128 128 16 backward 3694.0 12395 833.919982910156
pr1761 LSTM CUDA 128 128 16 forw and back 5376.0 17917 1190.0
pr1761 LSTM CPU 128 128 32 forward 4222.0 289 2270.0
pr1761 LSTM CPU 128 128 32 backward 8297.0 4621 9460.0
pr1761 LSTM CPU 128 128 32 forw and back 14760.0 11049 11140.0
pr1761 LSTM CUDA 128 128 32 forward 1931.0 6529 452.299987792969
pr1761 LSTM CUDA 128 128 32 backward 7032.0 24764 1630.0
pr1761 LSTM CUDA 128 128 32 forw and back 10200.0 35695 2360.0
pr1761 LSTM CPU 128 128 64 forward 8167.0 577 4560.0
pr1761 LSTM CPU 128 128 64 backward 16636.0 9197 18940.0
pr1761 LSTM CPU 128 128 64 forw and back 29544.0 21961 22320.0
pr1761 LSTM CUDA 128 128 64 forward 3787.0 13057 904.559997558594
pr1761 LSTM CUDA 128 128 64 backward 13967.0 49500 3260.0
pr1761 LSTM CUDA 128 128 64 forw and back 20377.0 71247 4720.0
pr1761 LSTM CPU 128 512 4 forward 1435.0 48 1070.0
pr1761 LSTM CPU 128 512 4 backward 1989.0 636 4180.0
pr1761 LSTM CPU 128 512 4 forw and back 3844.0 1546 4760.0
pr1761 LSTM CUDA 128 512 4 forward 380.585998535156 838 56.9099998474121
pr1761 LSTM CUDA 128 512 4 backward 1202.0 3147 209.440002441406
pr1761 LSTM CUDA 128 512 4 forw and back 1767.0 4666 312.079986572266
pr1761 LSTM CPU 128 512 16 forward 3478.0 192 4450.0
pr1761 LSTM CPU 128 512 16 backward 8329.0 2412 16880.0
pr1761 LSTM CPU 128 512 16 forw and back 15553.0 5782 19350.0
pr1761 LSTM CUDA 128 512 16 forward 1120.0 3358 227.619995117188
pr1761 LSTM CUDA 128 512 16 backward 3991.0 12519 835.859985351563
pr1761 LSTM CUDA 128 512 16 forw and back 5734.0 18214 1190.0
pr1761 LSTM CPU 128 512 32 forward 12533.0 384 8970.0
pr1761 LSTM CPU 128 512 32 backward 16724.0 4781 33800.0
pr1761 LSTM CPU 128 512 32 forw and back 30750.0 11432 38800.0
pr1761 LSTM CUDA 128 512 32 forward 2100.0 6718 455.25
pr1761 LSTM CUDA 128 512 32 backward 7663.0 25016 1630.0
pr1761 LSTM CUDA 128 512 32 forw and back 10944.0 36280 2370.0
pr1761 LSTM CPU 128 512 64 forward 25193.0 768 17990.0
pr1761 LSTM CPU 128 512 64 backward 33564.0 9517 67640.0
pr1761 LSTM CPU 128 512 64 forw and back 61978.0 22728 77710.0
pr1761 LSTM CUDA 128 512 64 forward 4050.00024414062 13438 910.52001953125
pr1761 LSTM CUDA 128 512 64 backward 14960.0 50008 3260.0
pr1761 LSTM CUDA 128 512 64 forw and back 21488.0 72408 4740.0
pr1761 LSTM CPU 512 32 4 forward 282.929992675781 37 71.1399993896484
pr1761 LSTM CPU 512 32 4 backward 867.286987304688 623 1110.0
pr1761 LSTM CPU 512 32 4 forw and back 1494.0 1506 1030.0
pr1761 LSTM CUDA 512 32 4 forward 365.976013183594 829 56.7700004577637
pr1761 LSTM CUDA 512 32 4 backward 1133.0 3107 208.809997558594
pr1761 LSTM CUDA 512 32 4 forw and back 1678.0 4593 310.299987792969
pr1761 LSTM CPU 512 32 16 forward 1185.0 145 296.230010986328
pr1761 LSTM CPU 512 32 16 backward 3641.0 2363 4620.0
pr1761 LSTM CPU 512 32 16 forw and back 5847.0 5622 4280.0
pr1761 LSTM CUDA 512 32 16 forward 1056.0 3313 226.919998168945
pr1761 LSTM CUDA 512 32 16 backward 3712.0 12347 833.169982910156
pr1761 LSTM CUDA 512 32 16 forw and back 5441.0 17949 1190.0
pr1761 LSTM CPU 512 32 32 forward 2399.0 289 596.359985351562
pr1761 LSTM CPU 512 32 32 backward 7361.0 4684 9290.0
pr1761 LSTM CPU 512 32 32 forw and back 11652.0 11112 8630.0
pr1761 LSTM CUDA 512 32 32 forward 1982.0 6625 453.799987792969
pr1761 LSTM CUDA 512 32 32 backward 7102.0 24668 1630.0
pr1761 LSTM CUDA 512 32 32 forw and back 10382.0 35759 2370.0
pr1761 LSTM CPU 512 32 64 forward 4867.0 577 1170.0
pr1761 LSTM CPU 512 32 64 backward 14773.0 9324 18650.0
pr1761 LSTM CPU 512 32 64 forw and back 23629.0 22088 17320.0
pr1761 LSTM CUDA 512 32 64 forward 3763.0 13249 907.559997558594
pr1761 LSTM CUDA 512 32 64 backward 13788.0 49308 3250.0
pr1761 LSTM CUDA 512 32 64 forw and back 20283.0 71375 4720.0
pr1761 LSTM CPU 512 128 4 forward 558.466979980469 37 276.640014648437
pr1761 LSTM CPU 512 128 4 backward 1349.0 623 2990.0
pr1761 LSTM CPU 512 128 4 forw and back 2169.0 1506 2440.0
pr1761 LSTM CUDA 512 128 4 forward 406.860992431641 837 56.8899993896484
pr1761 LSTM CUDA 512 128 4 backward 1166.0 3151 209.5
pr1761 LSTM CUDA 512 128 4 forw and back 1748.0 4637 310.980010986328
pr1761 LSTM CPU 512 128 16 forward 2391.0 145 1130.0
pr1761 LSTM CPU 512 128 16 backward 5799.0 2363 12170.0
pr1761 LSTM CPU 512 128 16 forw and back 8837.0 5622 10010.0
pr1761 LSTM CUDA 512 128 16 forward 1194.0 3345 227.419998168945
pr1761 LSTM CUDA 512 128 16 backward 3846.0 12523 835.919982910156
pr1761 LSTM CUDA 512 128 16 forw and back 5701.0 18125 1190.0
pr1761 LSTM CPU 512 128 32 forward 4913.0 289 2270.0
pr1761 LSTM CPU 512 128 32 backward 11581.0 4684 24410.0
pr1761 LSTM CPU 512 128 32 forw and back 17497.0 11112 20090.0
pr1761 LSTM CUDA 512 128 32 forward 2245.0 6689 454.799987792969
pr1761 LSTM CUDA 512 128 32 backward 7364.0 25020 1630.0
pr1761 LSTM CUDA 512 128 32 forw and back 10827.0 36111 2370.0
pr1761 LSTM CPU 512 128 64 forward 9983.0 577 4560.0
pr1761 LSTM CPU 512 128 64 backward 23183.0 9324 48880.0
pr1761 LSTM CPU 512 128 64 forw and back 35427.0 22088 40260.0
pr1761 LSTM CUDA 512 128 64 forward 4330.0 13377 909.559997558594
pr1761 LSTM CUDA 512 128 64 backward 14335.0 50012 3260.0
pr1761 LSTM CUDA 512 128 64 forw and back 21230.0 72079 4730.0
pr1761 LSTM CPU 512 512 4 forward 1688.0 48 1070.0
pr1761 LSTM CPU 512 512 4 backward 3557.0 643 10510.0
pr1761 LSTM CPU 512 512 4 forw and back 5102.0 1553 8090.0
pr1761 LSTM CUDA 512 512 4 forward 399.81298828125 858 57.2200012207031
pr1761 LSTM CUDA 512 512 4 backward 1241.0 3179 209.940002441406
pr1761 LSTM CUDA 512 512 4 forw and back 1815.0 4718 312.890014648437
pr1761 LSTM CPU 512 512 16 forward 7528.0 192 4450.0
pr1761 LSTM CPU 512 512 16 backward 14394.0 2443 42330.0
pr1761 LSTM CPU 512 512 16 forw and back 20404.0 5813 32800.0
pr1761 LSTM CUDA 512 512 16 forward 1216.0 3438 228.880004882812
pr1761 LSTM CUDA 512 512 16 backward 4185.0 12647 837.859985351563
pr1761 LSTM CUDA 512 512 16 forw and back 5924.0 18422 1200.0
pr1761 LSTM CPU 512 512 32 forward 15272.0 384 8970.0
pr1761 LSTM CPU 512 512 32 backward 29294.0 4844 84750.0
pr1761 LSTM CPU 512 512 32 forw and back 41290.0 11495 65750.0
pr1761 LSTM CUDA 512 512 32 forward 2254.0 6878 457.75
pr1761 LSTM CUDA 512 512 32 backward 8175.99951171875 25272 1640.0
pr1761 LSTM CUDA 512 512 32 forw and back 11253.0 36696 2380.0
pr1761 LSTM CPU 512 512 64 forward 30405.0 768 17990.0
pr1761 LSTM CPU 512 512 64 backward 58086.0 9644 169590.0
pr1761 LSTM CPU 512 512 64 forw and back 81369.0 22855 131650.0
pr1761 LSTM CUDA 512 512 64 forward 4307.0 13758 915.52001953125
pr1761 LSTM CUDA 512 512 64 backward 16188.0 50520 3270.0
pr1761 LSTM CUDA 512 512 64 forw and back 22430.0 73240 4750.0
pr1761 GRU CPU 32 32 4 forward 33.8040008544922 25 39.1100006103516
pr1761 GRU CPU 32 32 4 backward 178.608993530273 844 250.830001831055
pr1761 GRU CPU 32 32 4 forw and back 367.102996826172 1721 424.920013427734
pr1761 GRU CUDA 32 32 4 forward 382.704986572266 1005 117.639999389648
pr1761 GRU CUDA 32 32 4 backward 1490.0 3466 219.279998779297
pr1761 GRU CUDA 32 32 4 forw and back 2087.0 5025 355.75
pr1761 GRU CPU 32 32 16 forward 146.960998535156 97 165.199996948242
pr1761 GRU CPU 32 32 16 backward 717.275024414063 3149 1020.0
pr1761 GRU CPU 32 32 16 forw and back 1371.0 6415 1690.0
pr1761 GRU CUDA 32 32 16 forward 1158.0 4017 470.420013427734
pr1761 GRU CUDA 32 32 16 backward 4959.0 13607 867.5
pr1761 GRU CUDA 32 32 16 forw and back 6879.0 19535 1360.0
pr1761 GRU CPU 32 32 32 forward 305.890991210937 193 333.329986572266
pr1761 GRU CPU 32 32 32 backward 1411.0 6221 2060.0
pr1761 GRU CPU 32 32 32 forw and back 2689.0 12671 3400.0
pr1761 GRU CUDA 32 32 32 forward 2169.0 8033 940.799987792969
pr1761 GRU CUDA 32 32 32 backward 9623.0 27127 1690.0
pr1761 GRU CUDA 32 32 32 forw and back 13479.0 38879 2710.0
pr1761 GRU CPU 32 32 64 forward 617.130981445313 385 669.590026855469
pr1761 GRU CPU 32 32 64 backward 2846.0 12365 4130.0
pr1761 GRU CPU 32 32 64 forw and back 5336.0 25183 6810.0
pr1761 GRU CUDA 32 32 64 forward 4278.0 16065 1840.0
pr1761 GRU CUDA 32 32 64 backward 18976.0 54167 3380.0
pr1761 GRU CUDA 32 32 64 forw and back 26450.0 77567 5410.0
pr1761 GRU CPU 32 128 4 forward 127.137001037598 25 151.110000610352
pr1761 GRU CPU 32 128 4 backward 312.209991455078 844 751.950012207031
pr1761 GRU CPU 32 128 4 forw and back 612.393981933594 1721 1090.0
pr1761 GRU CUDA 32 128 4 forward 382.321990966797 1021 117.889999389648
pr1761 GRU CUDA 32 128 4 backward 1505.0 3502 219.839996337891
pr1761 GRU CUDA 32 128 4 forw and back 2110.0 5061 356.309997558594
pr1761 GRU CPU 32 128 16 forward 546.474975585937 97 640.200012207031
pr1761 GRU CPU 32 128 16 backward 1208.0 3149 3100.0
pr1761 GRU CPU 32 128 16 forw and back 2254.0 6415 4530.0
pr1761 GRU CUDA 32 128 16 forward 1167.0 4081 471.420013427734
pr1761 GRU CUDA 32 128 16 backward 5157.0 13751 869.75
pr1761 GRU CUDA 32 128 16 forw and back 7216.0 19679 1360.0
pr1761 GRU CPU 32 128 32 forward 1094.0 193 1260.0
pr1761 GRU CPU 32 128 32 backward 2450.0 6221 6260.0
pr1761 GRU CPU 32 128 32 forw and back 4498.0 12671 9120.0
pr1761 GRU CUDA 32 128 32 forward 2173.0 8161 942.799987792969
pr1761 GRU CUDA 32 128 32 backward 9933.0 27415 1700.0
pr1761 GRU CUDA 32 128 32 forw and back 13795.0 39167 2710.0
pr1761 GRU CPU 32 128 64 forward 2214.0 385 2540.0
pr1761 GRU CPU 32 128 64 backward 5052.0 12365 12580.0
pr1761 GRU CPU 32 128 64 forw and back 8956.0 25183 18300.0
pr1761 GRU CUDA 32 128 64 forward 4272.0 16321 1840.0
pr1761 GRU CUDA 32 128 64 backward 19251.0 54743 3390.0
pr1761 GRU CUDA 32 128 64 forw and back 26773.0 78143 5420.0
pr1761 GRU CPU 32 512 4 forward 650.534973144531 32 594.559997558594
pr1761 GRU CPU 32 512 4 backward 1806.0 880 2680.0
pr1761 GRU CPU 32 512 4 forw and back 3443.0 1772 3740.0
pr1761 GRU CUDA 32 512 4 forward 380.510009765625 1042 118.220001220703
pr1761 GRU CUDA 32 512 4 backward 1581.0 3594 223.589996337891
pr1761 GRU CUDA 32 512 4 forw and back 2205.0 5214 361.779998779297
pr1761 GRU CPU 32 512 16 forward 5215.0 128 2460.0
pr1761 GRU CPU 32 512 16 backward 7626.0 3305 11370.0
pr1761 GRU CPU 32 512 16 forw and back 14006.0 6634 15760.0
pr1761 GRU CUDA 32 512 16 forward 1169.0 4174 472.880004882812
pr1761 GRU CUDA 32 512 16 backward 5440.0 14131 884.940002441406
pr1761 GRU CUDA 32 512 16 forw and back 7500.0 20264 1380.0
pr1761 GRU CPU 32 512 32 forward 10522.0 256 4970.0
pr1761 GRU CPU 32 512 32 backward 15234.0 6537 22950.0
pr1761 GRU CPU 32 512 32 forw and back 27659.0 13114 31770.0
pr1761 GRU CUDA 32 512 32 forward 2190.0 8350 945.75
pr1761 GRU CUDA 32 512 32 backward 10514.0 28179 1730.0
pr1761 GRU CUDA 32 512 32 forw and back 14553.0 40328 2750.0
pr1761 GRU CPU 32 512 64 forward 21181.0 512 9990.0
pr1761 GRU CPU 32 512 64 backward 30551.0 13001 46120.0
pr1761 GRU CPU 32 512 64 forw and back 55245.0 26074 63800.0
pr1761 GRU CUDA 32 512 64 forward 4341.0 16702 1850.0
pr1761 GRU CUDA 32 512 64 backward 20676.0 56275 3450.0
pr1761 GRU CUDA 32 512 64 forw and back 28481.0 80456 5490.0
pr1761 GRU CPU 128 32 4 forward 45.9440002441406 25 39.1100006103516
pr1761 GRU CPU 128 32 4 backward 227.994995117187 844 409.829986572266
pr1761 GRU CPU 128 32 4 forw and back 436.911987304687 1721 535.919982910156
pr1761 GRU CUDA 128 32 4 forward 398.092010498047 1005 117.639999389648
pr1761 GRU CUDA 128 32 4 backward 1530.0 3466 219.279998779297
pr1761 GRU CUDA 128 32 4 forw and back 2135.0 5025 355.75
pr1761 GRU CPU 128 32 16 forward 193.138000488281 97 165.199996948242
pr1761 GRU CPU 128 32 16 backward 845.77099609375 3149 1670.0
pr1761 GRU CPU 128 32 16 forw and back 1542.0 6415 2150.0
pr1761 GRU CUDA 128 32 16 forward 1207.0 4017 470.420013427734
pr1761 GRU CUDA 128 32 16 backward 5151.0 13607 867.5
pr1761 GRU CUDA 128 32 16 forw and back 7240.0 19535 1360.0
pr1761 GRU CPU 128 32 32 forward 394.700012207031 193 333.329986572266
pr1761 GRU CPU 128 32 32 backward 1702.0 6221 3360.0
pr1761 GRU CPU 128 32 32 forw and back 3038.0 12671 4330.0
pr1761 GRU CUDA 128 32 32 forward 2262.0 8033 940.799987792969
pr1761 GRU CUDA 128 32 32 backward 9877.0 27127 1690.0
pr1761 GRU CUDA 128 32 32 forw and back 13811.0 38879 2710.0
pr1761 GRU CPU 128 32 64 forward 796.807983398438 385 669.590026855469
pr1761 GRU CPU 128 32 64 backward 3573.0 12365 6750.0
pr1761 GRU CPU 128 32 64 forw and back 6142.0 25183 8680.0
pr1761 GRU CUDA 128 32 64 forward 4383.0 16065 1840.0
pr1761 GRU CUDA 128 32 64 backward 19106.0 54167 3380.0
pr1761 GRU CUDA 128 32 64 forw and back 26794.0 77567 5410.0
pr1761 GRU CPU 128 128 4 forward 459.966003417969 25 151.110000610352
pr1761 GRU CPU 128 128 4 backward 1041.0 852 1170.0
pr1761 GRU CPU 128 128 4 forw and back 1103.0 1725 1340.0
pr1761 GRU CUDA 128 128 4 forward 389.419006347656 1021 117.889999389648
pr1761 GRU CUDA 128 128 4 backward 1522.0 3502 219.839996337891
pr1761 GRU CUDA 128 128 4 forw and back 2154.0 5061 356.309997558594
pr1761 GRU CPU 128 128 16 forward 1207.0 97 640.200012207031
pr1761 GRU CPU 128 128 16 backward 4282.0 3181 4870.0
pr1761 GRU CPU 128 128 16 forw and back 7375.0 6431 5550.0
pr1761 GRU CUDA 128 128 16 forward 1176.0 4081 471.420013427734
pr1761 GRU CUDA 128 128 16 backward 5240.0 13751 869.75
pr1761 GRU CUDA 128 128 16 forw and back 7321.0 19679 1360.0
pr1761 GRU CPU 128 128 32 forward 3869.0 193 1260.0
pr1761 GRU CPU 128 128 32 backward 6375.0 6285 9810.0
pr1761 GRU CPU 128 128 32 forw and back 12397.0 12703 11170.0
pr1761 GRU CUDA 128 128 32 forward 2221.0 8161 942.799987792969
pr1761 GRU CUDA 128 128 32 backward 10115.0 27415 1700.0
pr1761 GRU CUDA 128 128 32 forw and back 14042.0 39167 2710.0
pr1761 GRU CPU 128 128 64 forward 7791.0 385 2540.0
pr1761 GRU CPU 128 128 64 backward 17688.0 12493 19690.0
pr1761 GRU CPU 128 128 64 forw and back 29597.0 25247 22410.0
pr1761 GRU CUDA 128 128 64 forward 4373.0 16321 1840.0
pr1761 GRU CUDA 128 128 64 backward 19570.0 54743 3390.0
pr1761 GRU CUDA 128 128 64 forw and back 26788.0 78143 5420.0
pr1761 GRU CPU 128 512 4 forward 1295.0 32 594.559997558594
pr1761 GRU CPU 128 512 4 backward 2074.0 880 4240.0
pr1761 GRU CPU 128 512 4 forw and back 3677.0 1772 4560.0
pr1761 GRU CUDA 128 512 4 forward 389.225006103516 1042 118.220001220703
pr1761 GRU CUDA 128 512 4 backward 1583.0 3594 223.589996337891
pr1761 GRU CUDA 128 512 4 forw and back 2208.0 5214 361.779998779297
pr1761 GRU CPU 128 512 16 forward 5538.0 128 2460.0
pr1761 GRU CPU 128 512 16 backward 8798.0 3305 17640.0
pr1761 GRU CPU 128 512 16 forw and back 14756.0 6634 19030.0
pr1761 GRU CUDA 128 512 16 forward 1176.0 4174 472.880004882812
pr1761 GRU CUDA 128 512 16 backward 5420.0 14131 884.940002441406
pr1761 GRU CUDA 128 512 16 forw and back 7479.0 20264 1380.0
pr1761 GRU CPU 128 512 32 forward 11260.0 256 4970.0
pr1761 GRU CPU 128 512 32 backward 17652.0 6537 35500.0
pr1761 GRU CPU 128 512 32 forw and back 29339.0 13114 38320.0
pr1761 GRU CUDA 128 512 32 forward 2246.0 8350 945.75
pr1761 GRU CUDA 128 512 32 backward 10603.0 28179 1730.0
pr1761 GRU CUDA 128 512 32 forw and back 14578.0 40328 2750.0
pr1761 GRU CPU 128 512 64 forward 22706.0 512 9990.0
pr1761 GRU CPU 128 512 64 backward 35436.0 13001 71240.0
pr1761 GRU CPU 128 512 64 forw and back 59069.0 26074 76920.0
pr1761 GRU CUDA 128 512 64 forward 4426.0 16702 1850.0
pr1761 GRU CUDA 128 512 64 backward 20861.0 56275 3450.0
pr1761 GRU CUDA 128 512 64 forw and back 28711.0 80456 5490.0
pr1761 GRU CPU 512 32 4 forward 308.928985595703 25 39.1100006103516
pr1761 GRU CPU 512 32 4 backward 949.953002929688 859 1020.0
pr1761 GRU CPU 512 32 4 forw and back 1598.0 1732 979.059997558594
pr1761 GRU CUDA 512 32 4 forward 396.804992675781 1025 117.949996948242
pr1761 GRU CUDA 512 32 4 backward 1551.0 3498 219.779998779297
pr1761 GRU CUDA 512 32 4 forw and back 2157.0 5077 356.559997558594
pr1761 GRU CPU 512 32 16 forward 1251.0 97 165.199996948242
pr1761 GRU CPU 512 32 16 backward 3884.0 3212 4260.0
pr1761 GRU CPU 512 32 16 forw and back 6193.0 6462 3990.0
pr1761 GRU CUDA 512 32 16 forward 1193.0 4097 471.670013427734
pr1761 GRU CUDA 512 32 16 backward 5133.0 13735 869.5
pr1761 GRU CUDA 512 32 16 forw and back 7216.0 19743 1360.0
pr1761 GRU CPU 512 32 32 forward 2510.0 193 333.329986572266
pr1761 GRU CPU 512 32 32 backward 7843.0 6348 8570.0
pr1761 GRU CPU 512 32 32 forw and back 12336.0 12766 8040.0
pr1761 GRU CUDA 512 32 32 forward 2221.0 8193 943.299987792969
pr1761 GRU CUDA 512 32 32 backward 9822.0 27383 1700.0
pr1761 GRU CUDA 512 32 32 forw and back 13770.0 39295 2710.0
pr1761 GRU CPU 512 32 64 forward 5059.0 385 669.590026855469
pr1761 GRU CPU 512 32 64 backward 15663.0 12620 17200.0
pr1761 GRU CPU 512 32 64 forw and back 24714.0 25374 16129.9990234375
pr1761 GRU CUDA 512 32 64 forward 4358.0 16385 1840.0
pr1761 GRU CUDA 512 32 64 backward 19248.0 54679 3390.0
pr1761 GRU CUDA 512 32 64 forw and back 26556.0 78399 5420.0
pr1761 GRU CPU 512 128 4 forward 535.008972167969 25 151.110000610352
pr1761 GRU CPU 512 128 4 backward 1459.0 859 2920.0
pr1761 GRU CPU 512 128 4 forw and back 2211.0 1732 2330.0
pr1761 GRU CUDA 512 128 4 forward 388.5419921875 1041 118.199996948242
pr1761 GRU CUDA 512 128 4 backward 1544.0 3534 220.339996337891
pr1761 GRU CUDA 512 128 4 forw and back 2170.0 5113 357.119995117187
pr1761 GRU CPU 512 128 16 forward 2239.0 97 640.200012207031
pr1761 GRU CPU 512 128 16 backward 6088.0 3212 11960.0
pr1761 GRU CPU 512 128 16 forw and back 8836.0 6462 9640.0
pr1761 GRU CUDA 512 128 16 forward 1186.0 4161 472.670013427734
pr1761 GRU CUDA 512 128 16 backward 5222.0 13879 871.75
pr1761 GRU CUDA 512 128 16 forw and back 7237.0 19887 1360.0
pr1761 GRU CPU 512 128 32 forward 4584.0 193 1260.0
pr1761 GRU CPU 512 128 32 backward 12185.0 6348 24020.0
pr1761 GRU CPU 512 128 32 forw and back 17606.0 12766 19380.0
pr1761 GRU CUDA 512 128 32 forward 2194.0 8321 945.299987792969
pr1761 GRU CUDA 512 128 32 backward 10055.0 27671 1700.0
pr1761 GRU CUDA 512 128 32 forw and back 13803.0 39583 2720.0
pr1761 GRU CPU 512 128 64 forward 9301.0 385 2540.0
pr1761 GRU CPU 512 128 64 backward 24435.0 12620 48140.0
pr1761 GRU CPU 512 128 64 forw and back 35196.0 25374 38870.0
pr1761 GRU CUDA 512 128 64 forward 4295.0 16641 1850.0
pr1761 GRU CUDA 512 128 64 backward 19502.0 55255 3400.0
pr1761 GRU CUDA 512 128 64 forw and back 26931.0 78975 5430.0
pr1761 GRU CPU 512 512 4 forward 1608.0 32 594.559997558594
pr1761 GRU CPU 512 512 4 backward 3340.0 887 10480.0
pr1761 GRU CPU 512 512 4 forw and back 5084.0 1779 7800.0
pr1761 GRU CUDA 512 512 4 forward 435.325988769531 1062 118.529998779297
pr1761 GRU CUDA 512 512 4 backward 1656.0 3626 224.089996337891
pr1761 GRU CUDA 512 512 4 forw and back 2307.0 5266 362.589996337891
pr1761 GRU CPU 512 512 16 forward 7007.0 128 2460.0
pr1761 GRU CPU 512 512 16 backward 15175.0 3336 42730.0
pr1761 GRU CPU 512 512 16 forw and back 19999.0 6665 32119.998046875
pr1761 GRU CUDA 512 512 16 forward 1338.0 4254 474.119995117187
pr1761 GRU CUDA 512 512 16 backward 5731.0 14259 886.940002441406
pr1761 GRU CUDA 512 512 16 forw and back 7789.0 20472 1380.0
pr1761 GRU CPU 512 512 32 forward 14150.0 256 4970.0
pr1761 GRU CPU 512 512 32 backward 30405.0 6600 85710.0
pr1761 GRU CPU 512 512 32 forw and back 39853.0 13177 64540.0
pr1761 GRU CUDA 512 512 32 forward 2516.0 8510 948.25
pr1761 GRU CUDA 512 512 32 backward 11757.0 28435 1730.0
pr1761 GRU CUDA 512 512 32 forw and back 15063.0 40744 2760.0
pr1761 GRU CPU 512 512 64 forward 28400.0 512 9990.0
pr1761 GRU CPU 512 512 64 backward 60943.0 13128 171690.0
pr1761 GRU CPU 512 512 64 forw and back 79459.0 26201 129369.9921875
pr1761 GRU CUDA 512 512 64 forward 4888.0 17022 1850.0
pr1761 GRU CUDA 512 512 64 backward 22380.0 56787 3460.0
pr1761 GRU CUDA 512 512 64 forw and back 29432.0 81288 5500.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment