How to allocate a buffer in XLA that NCCL has registered as symmetric memory
(ncclCommWindowRegister(..., NCCL_WIN_COLL_SYMMETRIC)), reach it from a custom
GPU kernel through the NCCL device API (ncclGetLsaPointer / multimem), and
drive the whole thing from JAX via an FFI custom call.
This recipe tracks openxla/xla at HEAD (mid‑2026) and NCCL ≥ 2.27. It is built
around two concrete pieces of upstream code: