Guest @JamesReinders. website
- On device parallelism and multi-device parallelism
- OSDI 2021 Keynote on modern hardware having many chips
- Where is the parallelism in the data?
- Amdal's law - with larger data more chances for parallelism.
- Understand the data flow. What are the device limits? Registers needed, cache line size, IO buffer size, ...?
- Excited about compute kernels that auto-vectorize.
- J language dense multiply kernel
- Non Uniform Memory Access (NUMA) - pooling of virtual memory across local devices.
- Algorithms beat hardware
- Where is the program stalling/locking?
- What is the cheapest way to test your theories of how a program is behaving? Simple macro/printf.
- Are you over-computing the precision of anything?
- Thread Building Blocks(TBB) - rayon-rs - crossbeam-rs - Open ticket: Adding NUMA support to Rayon
- Supercomputer vs data center - supercomputer is about throughput.
- Kokkos
- Raja
- Circle-lang
- Design libraries to be multi-device from the start.
- SYCL
- Libraries are the best way to use hardware.