Created
March 27, 2022 20:47
-
-
Save mkschleg/d576a3b5e8514178a08829f8d3cda4f4 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Julia Version 1.7.2 | |
Commit bf53498635 (2022-02-06 15:21 UTC) | |
Platform Info: | |
OS: Linux (x86_64-pc-linux-gnu) | |
CPU: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz | |
WORD_SIZE: 64 | |
LIBM: libopenlibm | |
LLVM: libLLVM-12.0.1 (ORCJIT, skylake) | |
Environment: | |
JULIA_VERSION = 1.7 | |
JULIA_LOAD_PATH = /home/matt/Documents/Flux.jl/test/:/home/matt/Documents/Flux.jl/experiment/: | |
RNN Vec CPU n=2, ts=1 | |
forward | |
264.172 ns (4 allocations: 288 bytes) | |
backward | |
9.584 μs (117 allocations: 6.41 KiB) | |
forw and back | |
19.113 μs (212 allocations: 31.94 KiB) | |
RNN Vec CUDA n=2, ts=1 | |
forward | |
91.317 μs (93 allocations: 6.45 KiB) | |
backward | |
219.462 μs (377 allocations: 18.89 KiB) | |
forw and back | |
335.586 μs (617 allocations: 39.53 KiB) | |
RNN Vec CPU n=2, ts=4 | |
forward | |
700.552 ns (13 allocations: 1.00 KiB) | |
backward | |
27.689 μs (393 allocations: 22.86 KiB) | |
forw and back | |
46.552 μs (584 allocations: 92.36 KiB) | |
RNN Vec CUDA n=2, ts=4 | |
forward | |
171.984 μs (369 allocations: 25.64 KiB) | |
backward | |
496.415 μs (1478 allocations: 78.22 KiB) | |
forw and back | |
725.018 μs (2195 allocations: 141.38 KiB) | |
RNN Vec CPU n=2, ts=16 | |
forward | |
2.415 μs (49 allocations: 3.91 KiB) | |
backward | |
88.227 μs (1498 allocations: 88.41 KiB) | |
forw and back | |
136.871 μs (2074 allocations: 333.61 KiB) | |
RNN Vec CUDA n=2, ts=16 | |
forward | |
461.223 μs (1473 allocations: 102.42 KiB) | |
backward | |
1.469 ms (5882 allocations: 315.09 KiB) | |
forw and back | |
2.131 ms (8507 allocations: 547.91 KiB) | |
RNN Vec CPU n=2, ts=64 | |
forward | |
9.167 μs (193 allocations: 15.55 KiB) | |
backward | |
326.686 μs (5914 allocations: 350.95 KiB) | |
forw and back | |
494.513 μs (8026 allocations: 1.27 MiB) | |
RNN Vec CUDA n=2, ts=64 | |
forward | |
1.599 ms (5889 allocations: 409.56 KiB) | |
backward | |
5.257 ms (23499 allocations: 1.23 MiB) | |
forw and back | |
7.637 ms (33757 allocations: 2.12 MiB) | |
RNN Vec CPU n=20, ts=1 | |
forward | |
2.141 μs (4 allocations: 3.73 KiB) | |
backward | |
13.331 μs (117 allocations: 16.67 KiB) | |
forw and back | |
25.025 μs (212 allocations: 49.03 KiB) | |
RNN Vec CUDA n=20, ts=1 | |
forward | |
94.467 μs (93 allocations: 6.45 KiB) | |
backward | |
218.825 μs (377 allocations: 18.89 KiB) | |
forw and back | |
337.910 μs (617 allocations: 39.53 KiB) | |
RNN Vec CPU n=20, ts=4 | |
forward | |
9.432 μs (13 allocations: 19.64 KiB) | |
backward | |
46.884 μs (393 allocations: 83.42 KiB) | |
forw and back | |
74.057 μs (584 allocations: 190.12 KiB) | |
RNN Vec CUDA n=20, ts=4 | |
forward | |
175.496 μs (369 allocations: 25.64 KiB) | |
backward | |
497.927 μs (1481 allocations: 78.27 KiB) | |
forw and back | |
732.290 μs (2198 allocations: 141.42 KiB) | |
RNN Vec CPU n=20, ts=16 | |
forward | |
39.080 μs (49 allocations: 83.30 KiB) | |
backward | |
172.115 μs (1498 allocations: 350.16 KiB) | |
forw and back | |
259.511 μs (2074 allocations: 754.06 KiB) | |
RNN Vec CUDA n=20, ts=16 | |
forward | |
470.589 μs (1473 allocations: 102.42 KiB) | |
backward | |
1.477 ms (5897 allocations: 315.33 KiB) | |
forw and back | |
2.144 ms (8522 allocations: 548.14 KiB) | |
RNN Vec CPU n=20, ts=64 | |
forward | |
157.590 μs (193 allocations: 337.94 KiB) | |
backward | |
672.133 μs (5914 allocations: 1.38 MiB) | |
forw and back | |
999.957 μs (8026 allocations: 2.94 MiB) | |
RNN Vec CUDA n=20, ts=64 | |
forward | |
1.639 ms (5889 allocations: 409.56 KiB) | |
backward | |
5.262 ms (23562 allocations: 1.23 MiB) | |
forw and back | |
7.663 ms (33820 allocations: 2.12 MiB) | |
RNN Vec CPU n=200, ts=1 | |
forward | |
396.093 μs (6 allocations: 313.53 KiB) | |
backward | |
431.399 μs (122 allocations: 946.98 KiB) | |
forw and back | |
657.308 μs (221 allocations: 1.56 MiB) | |
RNN Vec CUDA n=200, ts=1 | |
forward | |
103.224 μs (94 allocations: 6.47 KiB) | |
backward | |
236.049 μs (380 allocations: 18.94 KiB) | |
forw and back | |
372.350 μs (664 allocations: 41.84 KiB) | |
RNN Vec CPU n=200, ts=4 | |
forward | |
899.520 μs (24 allocations: 1.68 MiB) | |
backward | |
1.427 ms (422 allocations: 5.52 MiB) | |
forw and back | |
2.166 ms (635 allocations: 8.95 MiB) | |
RNN Vec CUDA n=200, ts=4 | |
forward | |
203.731 μs (373 allocations: 25.70 KiB) | |
backward | |
542.370 μs (1496 allocations: 78.50 KiB) | |
forw and back | |
809.731 μs (2266 allocations: 144.06 KiB) | |
RNN Vec CPU n=200, ts=16 | |
forward | |
3.709 ms (96 allocations: 7.17 MiB) | |
backward | |
6.234 ms (1623 allocations: 23.92 MiB) | |
forw and back | |
9.189 ms (2293 allocations: 38.51 MiB) | |
RNN Vec CUDA n=200, ts=16 | |
forward | |
563.140 μs (1489 allocations: 102.67 KiB) | |
backward | |
1.662 ms (5960 allocations: 316.31 KiB) | |
forw and back | |
2.424 ms (8674 allocations: 552.09 KiB) | |
RNN Vec CPU n=200, ts=64 | |
forward | |
15.238 ms (384 allocations: 29.15 MiB) | |
backward | |
25.381 ms (6423 allocations: 97.52 MiB) | |
forw and back | |
39.505 ms (8917 allocations: 156.73 MiB) | |
RNN Vec CUDA n=200, ts=64 | |
forward | |
1.961 ms (5953 allocations: 410.56 KiB) | |
backward | |
6.072 ms (23817 allocations: 1.24 MiB) | |
forw and back | |
9.022 ms (34308 allocations: 2.13 MiB) | |
RNN Vec CPU n=1000, ts=1 | |
forward | |
7.714 ms (6 allocations: 7.63 MiB) | |
backward | |
13.385 ms (122 allocations: 22.91 MiB) | |
forw and back | |
19.695 ms (221 allocations: 38.20 MiB) | |
RNN Vec CUDA n=1000, ts=1 | |
forward | |
527.965 μs (168 allocations: 9.81 KiB) | |
backward | |
1.195 ms (476 allocations: 22.62 KiB) | |
forw and back | |
1.896 ms (834 allocations: 48.88 KiB) | |
RNN Vec CPU n=1000, ts=4 | |
forward | |
62.978 ms (24 allocations: 41.97 MiB) | |
backward | |
97.428 ms (422 allocations: 137.40 MiB) | |
forw and back | |
145.270 ms (635 allocations: 221.40 MiB) | |
RNN Vec CUDA n=1000, ts=4 | |
forward | |
2.298 ms (537 allocations: 30.45 KiB) | |
backward | |
4.818 ms (1745 allocations: 84.58 KiB) | |
forw and back | |
7.162 ms (2679 allocations: 154.89 KiB) | |
RNN Vec CPU n=1000, ts=16 | |
forward | |
266.436 ms (96 allocations: 179.30 MiB) | |
backward | |
424.925 ms (1623 allocations: 595.36 MiB) | |
forw and back | |
621.772 ms (2293 allocations: 954.19 MiB) | |
RNN Vec CUDA n=1000, ts=16 | |
forward | |
7.745 ms (2013 allocations: 113.05 KiB) | |
backward | |
19.628 ms (6821 allocations: 331.95 KiB) | |
forw and back | |
28.773 ms (10059 allocations: 578.11 KiB) | |
RNN Vec CPU n=1000, ts=64 | |
forward | |
1.092 s (384 allocations: 728.62 MiB) | |
backward | |
1.725 s (6423 allocations: 2.37 GiB) | |
forw and back | |
2.519 s (8917 allocations: 3.79 GiB) | |
RNN Vec CUDA n=1000, ts=64 | |
forward | |
30.842 ms (7917 allocations: 443.44 KiB) | |
backward | |
79.725 ms (27126 allocations: 1.29 MiB) | |
forw and back | |
117.308 ms (39581 allocations: 2.22 MiB) | |
RNN Block CPU n=2, ts=1 | |
forward | |
1.175 μs (14 allocations: 640 bytes) | |
backward | |
10.660 μs (132 allocations: 6.89 KiB) | |
forw and back | |
29.692 μs (261 allocations: 36.22 KiB) | |
RNN Block CUDA n=2, ts=1 | |
forward | |
124.708 μs (137 allocations: 8.02 KiB) | |
backward | |
249.134 μs (439 allocations: 21.31 KiB) | |
forw and back | |
417.532 μs (747 allocations: 46.67 KiB) | |
RNN Block CPU n=2, ts=4 | |
forward | |
1.690 μs (23 allocations: 1.39 KiB) | |
backward | |
28.258 μs (423 allocations: 24.09 KiB) | |
forw and back | |
58.840 μs (642 allocations: 98.61 KiB) | |
RNN Block CUDA n=2, ts=4 | |
forward | |
214.211 μs (503 allocations: 30.58 KiB) | |
backward | |
547.575 μs (1678 allocations: 86.39 KiB) | |
forw and back | |
837.096 μs (2466 allocations: 150.77 KiB) | |
RNN Block CPU n=2, ts=16 | |
forward | |
3.546 μs (59 allocations: 4.50 KiB) | |
backward | |
92.498 μs (1588 allocations: 92.56 KiB) | |
forw and back | |
155.328 μs (2168 allocations: 347.64 KiB) | |
RNN Block CUDA n=2, ts=16 | |
forward | |
554.636 μs (1967 allocations: 120.86 KiB) | |
backward | |
1.607 ms (6634 allocations: 346.55 KiB) | |
forw and back | |
2.333 ms (9342 allocations: 566.83 KiB) | |
RNN Block CPU n=2, ts=64 | |
forward | |
10.812 μs (203 allocations: 16.88 KiB) | |
backward | |
345.428 μs (6244 allocations: 366.75 KiB) | |
forw and back | |
534.571 μs (8264 allocations: 1.31 MiB) | |
RNN Block CUDA n=2, ts=64 | |
forward | |
1.886 ms (7823 allocations: 482.00 KiB) | |
backward | |
5.731 ms (26459 allocations: 1.35 MiB) | |
forw and back | |
8.162 ms (36848 allocations: 2.18 MiB) | |
RNN Block CPU n=20, ts=1 | |
forward | |
3.241 μs (14 allocations: 5.77 KiB) | |
backward | |
14.694 μs (132 allocations: 18.84 KiB) | |
forw and back | |
35.691 μs (261 allocations: 56.69 KiB) | |
RNN Block CUDA n=20, ts=1 | |
forward | |
123.940 μs (137 allocations: 8.02 KiB) | |
backward | |
250.611 μs (439 allocations: 21.31 KiB) | |
forw and back | |
416.753 μs (747 allocations: 46.67 KiB) | |
RNN Block CPU n=20, ts=4 | |
forward | |
11.016 μs (23 allocations: 26.28 KiB) | |
backward | |
48.984 μs (423 allocations: 90.91 KiB) | |
forw and back | |
88.382 μs (642 allocations: 203.81 KiB) | |
RNN Block CUDA n=20, ts=4 | |
forward | |
222.157 μs (503 allocations: 30.58 KiB) | |
backward | |
550.462 μs (1681 allocations: 86.44 KiB) | |
forw and back | |
855.562 μs (2512 allocations: 153.28 KiB) | |
RNN Block CPU n=20, ts=16 | |
forward | |
42.542 μs (60 allocations: 108.61 KiB) | |
backward | |
179.538 μs (1589 allocations: 379.05 KiB) | |
forw and back | |
280.389 μs (2170 allocations: 792.23 KiB) | |
RNN Block CUDA n=20, ts=16 | |
forward | |
568.705 μs (1967 allocations: 120.86 KiB) | |
backward | |
1.617 ms (6649 allocations: 346.78 KiB) | |
forw and back | |
2.371 ms (9400 allocations: 569.53 KiB) | |
RNN Block CPU n=20, ts=64 | |
forward | |
163.198 μs (204 allocations: 438.25 KiB) | |
backward | |
704.353 μs (6245 allocations: 1.50 MiB) | |
forw and back | |
1.056 ms (8266 allocations: 3.07 MiB) | |
RNN Block CUDA n=20, ts=64 | |
forward | |
1.954 ms (7823 allocations: 482.00 KiB) | |
backward | |
5.747 ms (26522 allocations: 1.36 MiB) | |
forw and back | |
8.229 ms (36954 allocations: 2.18 MiB) | |
RNN Block CPU n=200, ts=1 | |
forward | |
206.012 μs (17 allocations: 470.09 KiB) | |
backward | |
289.603 μs (138 allocations: 1.08 MiB) | |
forw and back | |
455.803 μs (272 allocations: 1.87 MiB) | |
RNN Block CUDA n=200, ts=1 | |
forward | |
131.274 μs (140 allocations: 8.06 KiB) | |
backward | |
262.678 μs (442 allocations: 21.36 KiB) | |
forw and back | |
450.127 μs (797 allocations: 49.25 KiB) | |
RNN Block CPU n=200, ts=4 | |
forward | |
926.796 μs (35 allocations: 2.29 MiB) | |
backward | |
1.545 ms (453 allocations: 6.14 MiB) | |
forw and back | |
2.288 ms (692 allocations: 9.72 MiB) | |
RNN Block CUDA n=200, ts=4 | |
forward | |
245.877 μs (515 allocations: 30.77 KiB) | |
backward | |
601.164 μs (1696 allocations: 86.67 KiB) | |
forw and back | |
924.747 μs (2543 allocations: 153.77 KiB) | |
RNN Block CPU n=200, ts=16 | |
forward | |
3.883 ms (107 allocations: 9.62 MiB) | |
backward | |
6.601 ms (1714 allocations: 26.37 MiB) | |
forw and back | |
9.613 ms (2374 allocations: 41.11 MiB) | |
RNN Block CUDA n=200, ts=16 | |
forward | |
655.443 μs (2015 allocations: 121.61 KiB) | |
backward | |
1.815 ms (6712 allocations: 347.77 KiB) | |
forw and back | |
2.756 ms (9527 allocations: 571.52 KiB) | |
RNN Block CPU n=200, ts=64 | |
forward | |
15.904 ms (395 allocations: 38.92 MiB) | |
backward | |
26.704 ms (6754 allocations: 107.30 MiB) | |
forw and back | |
41.180 ms (9094 allocations: 166.69 MiB) | |
RNN Block CUDA n=200, ts=64 | |
forward | |
2.262 ms (8015 allocations: 485.00 KiB) | |
backward | |
6.885 ms (26777 allocations: 1.36 MiB) | |
forw and back | |
9.792 ms (37465 allocations: 2.19 MiB) | |
RNN Block CPU n=1000, ts=1 | |
forward | |
8.044 ms (19 allocations: 11.45 MiB) | |
backward | |
13.822 ms (138 allocations: 26.72 MiB) | |
forw and back | |
20.431 ms (274 allocations: 45.83 MiB) | |
RNN Block CUDA n=1000, ts=1 | |
forward | |
611.569 μs (216 allocations: 11.44 KiB) | |
backward | |
1.185 ms (538 allocations: 25.05 KiB) | |
forw and back | |
1.819 ms (969 allocations: 56.31 KiB) | |
RNN Block CPU n=1000, ts=4 | |
forward | |
66.125 ms (37 allocations: 57.23 MiB) | |
backward | |
101.235 ms (453 allocations: 152.66 MiB) | |
forw and back | |
150.217 ms (694 allocations: 240.47 MiB) | |
RNN Block CUDA n=1000, ts=4 | |
forward | |
2.263 ms (681 allocations: 35.55 KiB) | |
backward | |
4.997 ms (1945 allocations: 92.75 KiB) | |
forw and back | |
7.396 ms (2958 allocations: 164.62 KiB) | |
RNN Block CPU n=1000, ts=16 | |
forward | |
279.050 ms (109 allocations: 240.33 MiB) | |
backward | |
441.440 ms (1714 allocations: 656.40 MiB) | |
forw and back | |
642.376 ms (2376 allocations: 1019.05 MiB) | |
RNN Block CUDA n=1000, ts=16 | |
forward | |
8.139 ms (2541 allocations: 132.02 KiB) | |
backward | |
20.460 ms (7573 allocations: 363.41 KiB) | |
forw and back | |
29.609 ms (10914 allocations: 597.56 KiB) | |
RNN Block CPU n=1000, ts=64 | |
forward | |
1.131 s (397 allocations: 972.76 MiB) | |
backward | |
1.792 s (6754 allocations: 2.61 GiB) | |
forw and back | |
3.213 s (9096 allocations: 4.04 GiB) | |
RNN Block CUDA n=1000, ts=64 | |
forward | |
32.230 ms (9981 allocations: 517.91 KiB) | |
backward | |
83.352 ms (30086 allocations: 1.41 MiB) | |
forw and back | |
121.319 ms (42740 allocations: 2.27 MiB) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment