Skip to content

Instantly share code, notes, and snippets.

@quasiben
Last active May 12, 2022 13:25
Show Gist options
  • Save quasiben/99282ded7e9b92f15d1f0f8abf85968d to your computer and use it in GitHub Desktop.
Save quasiben/99282ded7e9b92f15d1f0f8abf85968d to your computer and use it in GitHub Desktop.
-------------------------------
backend | dask
merge type | gpu
rows-per-chunk | 50000000
base-chunks | 4
other-chunks | 4
broadcast | default
protocol | ucx
device(s) | 0
rmm-pool | False
frac-match | 0.3
tcp | None
ib | None
nvlink | None
data-processed | 5.96 GiB
===============================
Wall-clock | Throughput
-------------------------------
226.49 ms | 26.32 GiB/s
217.98 ms | 27.34 GiB/s
240.57 ms | 24.78 GiB/s
207.36 ms | 28.74 GiB/s
253.45 ms | 23.52 GiB/s
237.75 ms | 25.07 GiB/s
215.30 ms | 27.68 GiB/s
223.77 ms | 26.64 GiB/s
229.77 ms | 25.94 GiB/s
208.81 ms | 28.55 GiB/s
===============================
-------------------------------
backend | dask
merge type | gpu
rows-per-chunk | 50000000
base-chunks | 8
other-chunks | 8
broadcast | default
protocol | ucx
device(s) | 0
rmm-pool | False
frac-match | 0.3
tcp | None
ib | None
nvlink | None
data-processed | 11.92 GiB
===============================
Wall-clock | Throughput
-------------------------------
981.79 ms | 12.14 GiB/s
879.93 ms | 13.55 GiB/s
854.19 ms | 13.96 GiB/s
865.64 ms | 13.77 GiB/s
874.47 ms | 13.63 GiB/s
793.21 ms | 15.03 GiB/s
701.13 ms | 17.00 GiB/s
767.72 ms | 15.53 GiB/s
1.30 s | 9.15 GiB/s
1.49 s | 7.99 GiB/s
===============================
-------------------------------
backend | dask
merge type | gpu
rows-per-chunk | 50000000
base-chunks | 16
other-chunks | 16
broadcast | default
protocol | ucx
device(s) | 0
rmm-pool | False
frac-match | 0.3
tcp | None
ib | None
nvlink | None
data-processed | 23.84 GiB
===============================
Wall-clock | Throughput
-------------------------------
2.04 s | 11.66 GiB/s
1.72 s | 13.88 GiB/s
1.87 s | 12.78 GiB/s
2.21 s | 10.78 GiB/s
2.78 s | 8.59 GiB/s
2.39 s | 9.96 GiB/s
2.31 s | 10.34 GiB/s
2.49 s | 9.58 GiB/s
2.41 s | 9.90 GiB/s
2.74 s | 8.70 GiB/s
===============================
-------------------------------
backend | dask
merge type | gpu
rows-per-chunk | 50000000
base-chunks | 32
other-chunks | 32
broadcast | default
protocol | ucx
device(s) | 0
rmm-pool | False
frac-match | 0.3
tcp | None
ib | None
nvlink | None
data-processed | 47.68 GiB
===============================
Wall-clock | Throughput
-------------------------------
11.83 s | 4.03 GiB/s
2.85 s | 16.73 GiB/s
3.64 s | 13.12 GiB/s
4.13 s | 11.54 GiB/s
4.10 s | 11.63 GiB/s
3.68 s | 12.97 GiB/s
3.72 s | 12.83 GiB/s
3.96 s | 12.04 GiB/s
3.43 s | 13.89 GiB/s
2.97 s | 16.07 GiB/s
===============================
-------------------------------
backend | dask
merge type | gpu
rows-per-chunk | 50000000
base-chunks | 64
other-chunks | 64
broadcast | default
protocol | ucx
device(s) | 0
rmm-pool | False
frac-match | 0.3
tcp | None
ib | None
nvlink | None
data-processed | 95.37 GiB
===============================
Wall-clock | Throughput
-------------------------------
3.51 s | 27.18 GiB/s
3.87 s | 24.65 GiB/s
3.87 s | 24.62 GiB/s
3.76 s | 25.33 GiB/s
4.08 s | 23.36 GiB/s
3.93 s | 24.27 GiB/s
3.80 s | 25.10 GiB/s
3.87 s | 24.64 GiB/s
4.10 s | 23.25 GiB/s
3.74 s | 25.47 GiB/s
===============================
-------------------------------
backend | dask
merge type | gpu
rows-per-chunk | 50000000
base-chunks | 128
other-chunks | 128
broadcast | default
protocol | ucx
device(s) | 0
rmm-pool | False
frac-match | 0.3
tcp | None
ib | None
nvlink | None
data-processed | 190.73 GiB
===============================
Wall-clock | Throughput
-------------------------------
5.11 s | 37.35 GiB/s
5.09 s | 37.44 GiB/s
4.61 s | 41.38 GiB/s
4.60 s | 41.50 GiB/s
5.25 s | 36.31 GiB/s
4.94 s | 38.59 GiB/s
5.13 s | 37.19 GiB/s
4.91 s | 38.88 GiB/s
5.26 s | 36.26 GiB/s
4.79 s | 39.84 GiB/s
===============================
@quasiben
Copy link
Author

1 -- tcp
Throughput | 1.81 GiB +/- 244.76 MiB
Wall-Clock | 3.36 s +/- 458.07 ms
2 -- tcp
Throughput | 2.81 GiB +/- 203.61 MiB
Wall-Clock | 4.26 s +/- 310.66 ms
4 -- tcp
Throughput | 5.43 GiB +/- 398.78 MiB
Wall-Clock | 4.41 s +/- 332.48 ms
8 -- tcp
Throughput | 7.85 GiB +/- 1.10 GiB
Wall-Clock | 6.22 s +/- 1.04 s
16 -- tcp
Throughput | 10.80 GiB +/- 1.01 GiB
Wall-Clock | 8.92 s +/- 994.60 ms
32 -- tcp
Throughput | 19.11 GiB +/- 889.94 MiB
Wall-Clock | 10.00 s +/- 461.28 ms
1 -- ucx
Throughput | 10.46 GiB +/- 2.83 GiB
Wall-Clock | 606.84 ms +/- 143.16 ms
2 -- ucx
Throughput | 9.62 GiB +/- 2.20 GiB
Wall-Clock | 1.29 s +/- 240.96 ms
4 -- ucx
Throughput | 14.04 GiB +/- 3.94 GiB
Wall-Clock | 1.84 s +/- 520.96 ms
8 -- ucx
Throughput | 12.87 GiB +/- 3.94 GiB
Wall-Clock | 4.46 s +/- 2.68 s
16 -- ucx
Throughput | 25.22 GiB +/- 1.87 GiB
Wall-Clock | 3.80 s +/- 277.06 ms
32 -- ucx
Throughput | 38.29 GiB +/- 3.03 GiB
Wall-Clock | 5.01 s +/- 420.34 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment