Skip to content

Instantly share code, notes, and snippets.

@mkschleg
Created March 27, 2022 20:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mkschleg/d576a3b5e8514178a08829f8d3cda4f4 to your computer and use it in GitHub Desktop.
Save mkschleg/d576a3b5e8514178a08829f8d3cda4f4 to your computer and use it in GitHub Desktop.
Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
JULIA_VERSION = 1.7
JULIA_LOAD_PATH = /home/matt/Documents/Flux.jl/test/:/home/matt/Documents/Flux.jl/experiment/:
RNN Vec CPU n=2, ts=1
forward
264.172 ns (4 allocations: 288 bytes)
backward
9.584 μs (117 allocations: 6.41 KiB)
forw and back
19.113 μs (212 allocations: 31.94 KiB)
RNN Vec CUDA n=2, ts=1
forward
91.317 μs (93 allocations: 6.45 KiB)
backward
219.462 μs (377 allocations: 18.89 KiB)
forw and back
335.586 μs (617 allocations: 39.53 KiB)
RNN Vec CPU n=2, ts=4
forward
700.552 ns (13 allocations: 1.00 KiB)
backward
27.689 μs (393 allocations: 22.86 KiB)
forw and back
46.552 μs (584 allocations: 92.36 KiB)
RNN Vec CUDA n=2, ts=4
forward
171.984 μs (369 allocations: 25.64 KiB)
backward
496.415 μs (1478 allocations: 78.22 KiB)
forw and back
725.018 μs (2195 allocations: 141.38 KiB)
RNN Vec CPU n=2, ts=16
forward
2.415 μs (49 allocations: 3.91 KiB)
backward
88.227 μs (1498 allocations: 88.41 KiB)
forw and back
136.871 μs (2074 allocations: 333.61 KiB)
RNN Vec CUDA n=2, ts=16
forward
461.223 μs (1473 allocations: 102.42 KiB)
backward
1.469 ms (5882 allocations: 315.09 KiB)
forw and back
2.131 ms (8507 allocations: 547.91 KiB)
RNN Vec CPU n=2, ts=64
forward
9.167 μs (193 allocations: 15.55 KiB)
backward
326.686 μs (5914 allocations: 350.95 KiB)
forw and back
494.513 μs (8026 allocations: 1.27 MiB)
RNN Vec CUDA n=2, ts=64
forward
1.599 ms (5889 allocations: 409.56 KiB)
backward
5.257 ms (23499 allocations: 1.23 MiB)
forw and back
7.637 ms (33757 allocations: 2.12 MiB)
RNN Vec CPU n=20, ts=1
forward
2.141 μs (4 allocations: 3.73 KiB)
backward
13.331 μs (117 allocations: 16.67 KiB)
forw and back
25.025 μs (212 allocations: 49.03 KiB)
RNN Vec CUDA n=20, ts=1
forward
94.467 μs (93 allocations: 6.45 KiB)
backward
218.825 μs (377 allocations: 18.89 KiB)
forw and back
337.910 μs (617 allocations: 39.53 KiB)
RNN Vec CPU n=20, ts=4
forward
9.432 μs (13 allocations: 19.64 KiB)
backward
46.884 μs (393 allocations: 83.42 KiB)
forw and back
74.057 μs (584 allocations: 190.12 KiB)
RNN Vec CUDA n=20, ts=4
forward
175.496 μs (369 allocations: 25.64 KiB)
backward
497.927 μs (1481 allocations: 78.27 KiB)
forw and back
732.290 μs (2198 allocations: 141.42 KiB)
RNN Vec CPU n=20, ts=16
forward
39.080 μs (49 allocations: 83.30 KiB)
backward
172.115 μs (1498 allocations: 350.16 KiB)
forw and back
259.511 μs (2074 allocations: 754.06 KiB)
RNN Vec CUDA n=20, ts=16
forward
470.589 μs (1473 allocations: 102.42 KiB)
backward
1.477 ms (5897 allocations: 315.33 KiB)
forw and back
2.144 ms (8522 allocations: 548.14 KiB)
RNN Vec CPU n=20, ts=64
forward
157.590 μs (193 allocations: 337.94 KiB)
backward
672.133 μs (5914 allocations: 1.38 MiB)
forw and back
999.957 μs (8026 allocations: 2.94 MiB)
RNN Vec CUDA n=20, ts=64
forward
1.639 ms (5889 allocations: 409.56 KiB)
backward
5.262 ms (23562 allocations: 1.23 MiB)
forw and back
7.663 ms (33820 allocations: 2.12 MiB)
RNN Vec CPU n=200, ts=1
forward
396.093 μs (6 allocations: 313.53 KiB)
backward
431.399 μs (122 allocations: 946.98 KiB)
forw and back
657.308 μs (221 allocations: 1.56 MiB)
RNN Vec CUDA n=200, ts=1
forward
103.224 μs (94 allocations: 6.47 KiB)
backward
236.049 μs (380 allocations: 18.94 KiB)
forw and back
372.350 μs (664 allocations: 41.84 KiB)
RNN Vec CPU n=200, ts=4
forward
899.520 μs (24 allocations: 1.68 MiB)
backward
1.427 ms (422 allocations: 5.52 MiB)
forw and back
2.166 ms (635 allocations: 8.95 MiB)
RNN Vec CUDA n=200, ts=4
forward
203.731 μs (373 allocations: 25.70 KiB)
backward
542.370 μs (1496 allocations: 78.50 KiB)
forw and back
809.731 μs (2266 allocations: 144.06 KiB)
RNN Vec CPU n=200, ts=16
forward
3.709 ms (96 allocations: 7.17 MiB)
backward
6.234 ms (1623 allocations: 23.92 MiB)
forw and back
9.189 ms (2293 allocations: 38.51 MiB)
RNN Vec CUDA n=200, ts=16
forward
563.140 μs (1489 allocations: 102.67 KiB)
backward
1.662 ms (5960 allocations: 316.31 KiB)
forw and back
2.424 ms (8674 allocations: 552.09 KiB)
RNN Vec CPU n=200, ts=64
forward
15.238 ms (384 allocations: 29.15 MiB)
backward
25.381 ms (6423 allocations: 97.52 MiB)
forw and back
39.505 ms (8917 allocations: 156.73 MiB)
RNN Vec CUDA n=200, ts=64
forward
1.961 ms (5953 allocations: 410.56 KiB)
backward
6.072 ms (23817 allocations: 1.24 MiB)
forw and back
9.022 ms (34308 allocations: 2.13 MiB)
RNN Vec CPU n=1000, ts=1
forward
7.714 ms (6 allocations: 7.63 MiB)
backward
13.385 ms (122 allocations: 22.91 MiB)
forw and back
19.695 ms (221 allocations: 38.20 MiB)
RNN Vec CUDA n=1000, ts=1
forward
527.965 μs (168 allocations: 9.81 KiB)
backward
1.195 ms (476 allocations: 22.62 KiB)
forw and back
1.896 ms (834 allocations: 48.88 KiB)
RNN Vec CPU n=1000, ts=4
forward
62.978 ms (24 allocations: 41.97 MiB)
backward
97.428 ms (422 allocations: 137.40 MiB)
forw and back
145.270 ms (635 allocations: 221.40 MiB)
RNN Vec CUDA n=1000, ts=4
forward
2.298 ms (537 allocations: 30.45 KiB)
backward
4.818 ms (1745 allocations: 84.58 KiB)
forw and back
7.162 ms (2679 allocations: 154.89 KiB)
RNN Vec CPU n=1000, ts=16
forward
266.436 ms (96 allocations: 179.30 MiB)
backward
424.925 ms (1623 allocations: 595.36 MiB)
forw and back
621.772 ms (2293 allocations: 954.19 MiB)
RNN Vec CUDA n=1000, ts=16
forward
7.745 ms (2013 allocations: 113.05 KiB)
backward
19.628 ms (6821 allocations: 331.95 KiB)
forw and back
28.773 ms (10059 allocations: 578.11 KiB)
RNN Vec CPU n=1000, ts=64
forward
1.092 s (384 allocations: 728.62 MiB)
backward
1.725 s (6423 allocations: 2.37 GiB)
forw and back
2.519 s (8917 allocations: 3.79 GiB)
RNN Vec CUDA n=1000, ts=64
forward
30.842 ms (7917 allocations: 443.44 KiB)
backward
79.725 ms (27126 allocations: 1.29 MiB)
forw and back
117.308 ms (39581 allocations: 2.22 MiB)
RNN Block CPU n=2, ts=1
forward
1.175 μs (14 allocations: 640 bytes)
backward
10.660 μs (132 allocations: 6.89 KiB)
forw and back
29.692 μs (261 allocations: 36.22 KiB)
RNN Block CUDA n=2, ts=1
forward
124.708 μs (137 allocations: 8.02 KiB)
backward
249.134 μs (439 allocations: 21.31 KiB)
forw and back
417.532 μs (747 allocations: 46.67 KiB)
RNN Block CPU n=2, ts=4
forward
1.690 μs (23 allocations: 1.39 KiB)
backward
28.258 μs (423 allocations: 24.09 KiB)
forw and back
58.840 μs (642 allocations: 98.61 KiB)
RNN Block CUDA n=2, ts=4
forward
214.211 μs (503 allocations: 30.58 KiB)
backward
547.575 μs (1678 allocations: 86.39 KiB)
forw and back
837.096 μs (2466 allocations: 150.77 KiB)
RNN Block CPU n=2, ts=16
forward
3.546 μs (59 allocations: 4.50 KiB)
backward
92.498 μs (1588 allocations: 92.56 KiB)
forw and back
155.328 μs (2168 allocations: 347.64 KiB)
RNN Block CUDA n=2, ts=16
forward
554.636 μs (1967 allocations: 120.86 KiB)
backward
1.607 ms (6634 allocations: 346.55 KiB)
forw and back
2.333 ms (9342 allocations: 566.83 KiB)
RNN Block CPU n=2, ts=64
forward
10.812 μs (203 allocations: 16.88 KiB)
backward
345.428 μs (6244 allocations: 366.75 KiB)
forw and back
534.571 μs (8264 allocations: 1.31 MiB)
RNN Block CUDA n=2, ts=64
forward
1.886 ms (7823 allocations: 482.00 KiB)
backward
5.731 ms (26459 allocations: 1.35 MiB)
forw and back
8.162 ms (36848 allocations: 2.18 MiB)
RNN Block CPU n=20, ts=1
forward
3.241 μs (14 allocations: 5.77 KiB)
backward
14.694 μs (132 allocations: 18.84 KiB)
forw and back
35.691 μs (261 allocations: 56.69 KiB)
RNN Block CUDA n=20, ts=1
forward
123.940 μs (137 allocations: 8.02 KiB)
backward
250.611 μs (439 allocations: 21.31 KiB)
forw and back
416.753 μs (747 allocations: 46.67 KiB)
RNN Block CPU n=20, ts=4
forward
11.016 μs (23 allocations: 26.28 KiB)
backward
48.984 μs (423 allocations: 90.91 KiB)
forw and back
88.382 μs (642 allocations: 203.81 KiB)
RNN Block CUDA n=20, ts=4
forward
222.157 μs (503 allocations: 30.58 KiB)
backward
550.462 μs (1681 allocations: 86.44 KiB)
forw and back
855.562 μs (2512 allocations: 153.28 KiB)
RNN Block CPU n=20, ts=16
forward
42.542 μs (60 allocations: 108.61 KiB)
backward
179.538 μs (1589 allocations: 379.05 KiB)
forw and back
280.389 μs (2170 allocations: 792.23 KiB)
RNN Block CUDA n=20, ts=16
forward
568.705 μs (1967 allocations: 120.86 KiB)
backward
1.617 ms (6649 allocations: 346.78 KiB)
forw and back
2.371 ms (9400 allocations: 569.53 KiB)
RNN Block CPU n=20, ts=64
forward
163.198 μs (204 allocations: 438.25 KiB)
backward
704.353 μs (6245 allocations: 1.50 MiB)
forw and back
1.056 ms (8266 allocations: 3.07 MiB)
RNN Block CUDA n=20, ts=64
forward
1.954 ms (7823 allocations: 482.00 KiB)
backward
5.747 ms (26522 allocations: 1.36 MiB)
forw and back
8.229 ms (36954 allocations: 2.18 MiB)
RNN Block CPU n=200, ts=1
forward
206.012 μs (17 allocations: 470.09 KiB)
backward
289.603 μs (138 allocations: 1.08 MiB)
forw and back
455.803 μs (272 allocations: 1.87 MiB)
RNN Block CUDA n=200, ts=1
forward
131.274 μs (140 allocations: 8.06 KiB)
backward
262.678 μs (442 allocations: 21.36 KiB)
forw and back
450.127 μs (797 allocations: 49.25 KiB)
RNN Block CPU n=200, ts=4
forward
926.796 μs (35 allocations: 2.29 MiB)
backward
1.545 ms (453 allocations: 6.14 MiB)
forw and back
2.288 ms (692 allocations: 9.72 MiB)
RNN Block CUDA n=200, ts=4
forward
245.877 μs (515 allocations: 30.77 KiB)
backward
601.164 μs (1696 allocations: 86.67 KiB)
forw and back
924.747 μs (2543 allocations: 153.77 KiB)
RNN Block CPU n=200, ts=16
forward
3.883 ms (107 allocations: 9.62 MiB)
backward
6.601 ms (1714 allocations: 26.37 MiB)
forw and back
9.613 ms (2374 allocations: 41.11 MiB)
RNN Block CUDA n=200, ts=16
forward
655.443 μs (2015 allocations: 121.61 KiB)
backward
1.815 ms (6712 allocations: 347.77 KiB)
forw and back
2.756 ms (9527 allocations: 571.52 KiB)
RNN Block CPU n=200, ts=64
forward
15.904 ms (395 allocations: 38.92 MiB)
backward
26.704 ms (6754 allocations: 107.30 MiB)
forw and back
41.180 ms (9094 allocations: 166.69 MiB)
RNN Block CUDA n=200, ts=64
forward
2.262 ms (8015 allocations: 485.00 KiB)
backward
6.885 ms (26777 allocations: 1.36 MiB)
forw and back
9.792 ms (37465 allocations: 2.19 MiB)
RNN Block CPU n=1000, ts=1
forward
8.044 ms (19 allocations: 11.45 MiB)
backward
13.822 ms (138 allocations: 26.72 MiB)
forw and back
20.431 ms (274 allocations: 45.83 MiB)
RNN Block CUDA n=1000, ts=1
forward
611.569 μs (216 allocations: 11.44 KiB)
backward
1.185 ms (538 allocations: 25.05 KiB)
forw and back
1.819 ms (969 allocations: 56.31 KiB)
RNN Block CPU n=1000, ts=4
forward
66.125 ms (37 allocations: 57.23 MiB)
backward
101.235 ms (453 allocations: 152.66 MiB)
forw and back
150.217 ms (694 allocations: 240.47 MiB)
RNN Block CUDA n=1000, ts=4
forward
2.263 ms (681 allocations: 35.55 KiB)
backward
4.997 ms (1945 allocations: 92.75 KiB)
forw and back
7.396 ms (2958 allocations: 164.62 KiB)
RNN Block CPU n=1000, ts=16
forward
279.050 ms (109 allocations: 240.33 MiB)
backward
441.440 ms (1714 allocations: 656.40 MiB)
forw and back
642.376 ms (2376 allocations: 1019.05 MiB)
RNN Block CUDA n=1000, ts=16
forward
8.139 ms (2541 allocations: 132.02 KiB)
backward
20.460 ms (7573 allocations: 363.41 KiB)
forw and back
29.609 ms (10914 allocations: 597.56 KiB)
RNN Block CPU n=1000, ts=64
forward
1.131 s (397 allocations: 972.76 MiB)
backward
1.792 s (6754 allocations: 2.61 GiB)
forw and back
3.213 s (9096 allocations: 4.04 GiB)
RNN Block CUDA n=1000, ts=64
forward
32.230 ms (9981 allocations: 517.91 KiB)
backward
83.352 ms (30086 allocations: 1.41 MiB)
forw and back
121.319 ms (42740 allocations: 2.27 MiB)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment