Multi-GPU communication schemes for iterative solvers: when CPUs are not in charge

Publication:
Multi-GPU communication schemes for iterative solvers: when CPUs are not in charge

dc.contributor.coauthor	Wahib, Mohamed
dc.contributor.department	Department of Computer Engineering
dc.contributor.department	Graduate School of Sciences and Engineering
dc.contributor.facultymember	Yes
dc.contributor.kuauthor	Erten, Didem Unat
dc.contributor.kuauthor	Baydamirli, Javid
dc.contributor.kuauthor	Sağbili, Doğan
dc.contributor.kuauthor	Ismayilov, Ismail
dc.contributor.schoolcollegeinstitute	College of Engineering
dc.contributor.schoolcollegeinstitute	GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
dc.date.accessioned	2025-01-19T10:31:47Z
dc.date.issued	2023
dc.description.abstract	This paper proposes a fully autonomous execution model for multi-GPU applications that completely excludes the involvement of the CPU beyond the initial kernel launch. In a typical multi-GPU application, the host serves as the orchestrator of execution by directly launching kernels, issuing communication calls, and acting as a synchronizer for devices. We argue that this orchestration, or control flow path, causes undue overhead and can be delegated entirely to devices to improve performance in applications that require communication among peers. For the proposed CPU-free execution model, we leverage existing techniques such as persistent kernels, thread block specialization, device-side barriers, and device-initiated communication routines to write fully autonomous multi-GPU code and achieve significantly reduced communication overheads. We demonstrate our proposed model on two broadly used iterative solvers, 2D/3D Jacobi stencil and Conjugate Gradient(CG). Compared to the CPU-controlled baselines, the CPU-free model can improve 3D stencil communication latency by 58.8% and provide a 1.63x speedup for CG on 8 NVIDIA A100 GPUs. The project code is available at https://github.com/ParCoreLab/CPU-Free-model. © 2023 Owner/Author(s).
dc.description.fulltext	No
dc.description.harvestedfrom	Manual
dc.description.indexedby	Scopus
dc.description.openaccess	Bronze OA
dc.description.peerreviewstatus
dc.description.publisherscope	International
dc.description.readpublish	N/A
dc.description.sponsoredbyTubitakEu	EU
dc.description.sponsorship	This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 949587).
dc.description.studentonlypublication	No
dc.description.studentpublication	Yes
dc.description.version	Published Version
dc.identifier.doi	10.1145/3577193.3593713
dc.identifier.embargo	No
dc.identifier.grantno	949587
dc.identifier.isbn	9798400700569
dc.identifier.scopus	2-s2.0-85168413592
dc.identifier.uri	https://doi.org/10.1145/3577193.3593713
dc.identifier.uri	https://hdl.handle.net/20.500.14288/26289
dc.keywords	GPU-initiated communication
dc.keywords	Iterative solvers
dc.keywords	Multi-GPU
dc.keywords	NVSHMEM
dc.keywords	Persistent kernels
dc.language.iso	eng
dc.publisher	Association for Computing Machinery
dc.relation.affiliation	Koç University
dc.relation.collection	Koç University Institutional Repository
dc.relation.ispartof	Proceedings of the International Conference on Supercomputing
dc.relation.openaccess
dc.rights	N/A
dc.rights.uri	N/A
dc.subject	Computer science
dc.title	Multi-GPU communication schemes for iterative solvers: when CPUs are not in charge
dc.type	Conference Proceeding
dspace.entity.type	Publication
local.contributor.kuauthor	Erten, Didem Unat
local.contributor.kuauthor	Sağbili Doğan
local.contributor.kuauthor	Baydamirli Javid
local.contributor.kuauthor	Ismayilov, Ismayil
relation.isOrgUnitOfPublication	89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication	3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isOrgUnitOfPublication.latestForDiscovery	89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isParentOrgUnitOfPublication	8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublication	434c9663-2b11-4e66-9399-c863e2ebae43
relation.isParentOrgUnitOfPublication.latestForDiscovery	8e756b23-2d4a-4ce8-b1b3-62c794a8c164

Collections

Publications without Fulltext

Publication: Multi-GPU communication schemes for iterative solvers: when CPUs are not in charge

Files

Collections

Publication:
Multi-GPU communication schemes for iterative solvers: when CPUs are not in charge