Publication: CPU- and GPU-initiated communication strategies for conjugate gradient methods on large GPU clusters
Program
KU Authors
Co-Authors
Trotter, James D.
Langguth, Johannes
Cai, Xing
Publication Date
Language
Embargo Status
No
Journal Title
Journal ISSN
Volume Title
Alternative Title
Abstract
The Conjugate Gradient (CG) method is a key building block in numerous applications, yet its low computational intensity and sensitivity to communication overhead make it difficult to scale efficiently on multi-GPU systems. In light of recent advances in multi-GPU communication technologies, we revisit CG parallelization for large-scale GPU clusters. This work presents scalable CG and pipelined CG solvers targeting NVIDIA and AMD GPUs, using GPU-aware MPI, NCCL/RCCL and NVSHMEM to implement both CPU- and GPU-initiated communication schemes. We also introduce a monolithic variant that offloads the entire CG loop to the GPU, enabling fully device-initiated execution via NVSHMEM. Optimizations across all variants reduce unnecessary data transfers and synchronization overheads; the GPU-initiated variant eliminates CPU involvement altogether. We benchmark our implementations on NVIDIA- and AMD-based supercomputers using SuiteSparse matrices and a real-world finite element application. By avoiding data transfers and synchronization bottlenecks, our single-GPU implementations achieve 8-14 % performance gains over state-of-the-art solvers. In strong scaling tests on over 1,000 GPUs, we outperform existing approaches by 5-15 %. While CPU-initiated variants remain favorable due to a lack of vendor supported device-side computational kernels and suboptimal NVSHMEM configurations at the clusters, the strong scaling properties of the GPU-initiated CG variant indicates that it will be highly competitive at even larger GPU counts and with further tuning.
Source
Publisher
Association for Computing Machinery
Subject
Computer Engineering
Citation
Has Part
Source
Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2025
Book Series Title
Edition
DOI
10.1145/3712285.3759774
item.page.datauri
Link
Rights
CC BY-NC-ND (Attribution-NonCommercial-NoDerivs)
Copyrights Note
Creative Commons license
Except where otherwised noted, this item's license is described as CC BY-NC-ND (Attribution-NonCommercial-NoDerivs)

