Publication:
CPU- and GPU-initiated communication strategies for conjugate gradient methods on large GPU clusters

Placeholder

School / College / Institute

Organizational Unit

Program

KU Authors

Co-Authors

Trotter, James D.
Langguth, Johannes
Cai, Xing

Publication Date

Language

Embargo Status

No

Journal Title

Journal ISSN

Volume Title

Alternative Title

Abstract

The Conjugate Gradient (CG) method is a key building block in numerous applications, yet its low computational intensity and sensitivity to communication overhead make it difficult to scale efficiently on multi-GPU systems. In light of recent advances in multi-GPU communication technologies, we revisit CG parallelization for large-scale GPU clusters. This work presents scalable CG and pipelined CG solvers targeting NVIDIA and AMD GPUs, using GPU-aware MPI, NCCL/RCCL and NVSHMEM to implement both CPU- and GPU-initiated communication schemes. We also introduce a monolithic variant that offloads the entire CG loop to the GPU, enabling fully device-initiated execution via NVSHMEM. Optimizations across all variants reduce unnecessary data transfers and synchronization overheads; the GPU-initiated variant eliminates CPU involvement altogether. We benchmark our implementations on NVIDIA- and AMD-based supercomputers using SuiteSparse matrices and a real-world finite element application. By avoiding data transfers and synchronization bottlenecks, our single-GPU implementations achieve 8-14 % performance gains over state-of-the-art solvers. In strong scaling tests on over 1,000 GPUs, we outperform existing approaches by 5-15 %. While CPU-initiated variants remain favorable due to a lack of vendor supported device-side computational kernels and suboptimal NVSHMEM configurations at the clusters, the strong scaling properties of the GPU-initiated CG variant indicates that it will be highly competitive at even larger GPU counts and with further tuning.

Source

Publisher

Association for Computing Machinery

Subject

Computer Engineering

Citation

Has Part

Source

Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2025

Book Series Title

Edition

DOI

10.1145/3712285.3759774

item.page.datauri

Link

Rights

CC BY-NC-ND (Attribution-NonCommercial-NoDerivs)

Copyrights Note

Creative Commons license

Except where otherwised noted, this item's license is described as CC BY-NC-ND (Attribution-NonCommercial-NoDerivs)

Endorsement

Review

Supplemented By

Referenced By

0

Views

0

Downloads

View PlumX Details