Publication:
Multi-GPU communication schemes for iterative solvers: when CPUs are not in charge

dc.contributor.coauthorWahib, Mohamed
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.departmentGraduate School of Sciences and Engineering
dc.contributor.kuauthorErten, Didem Unat
dc.contributor.kuauthorBaydamirli, Javid
dc.contributor.kuauthorSağbili, Doğan
dc.contributor.kuauthorIsmayilov, Ismail
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.schoolcollegeinstituteGRADUATE SCHOOL OF SCIENCES AND ENGINEERING
dc.date.accessioned2025-01-19T10:31:47Z
dc.date.issued2023
dc.description.abstractThis paper proposes a fully autonomous execution model for multi-GPU applications that completely excludes the involvement of the CPU beyond the initial kernel launch. In a typical multi-GPU application, the host serves as the orchestrator of execution by directly launching kernels, issuing communication calls, and acting as a synchronizer for devices. We argue that this orchestration, or control flow path, causes undue overhead and can be delegated entirely to devices to improve performance in applications that require communication among peers. For the proposed CPU-free execution model, we leverage existing techniques such as persistent kernels, thread block specialization, device-side barriers, and device-initiated communication routines to write fully autonomous multi-GPU code and achieve significantly reduced communication overheads. We demonstrate our proposed model on two broadly used iterative solvers, 2D/3D Jacobi stencil and Conjugate Gradient(CG). Compared to the CPU-controlled baselines, the CPU-free model can improve 3D stencil communication latency by 58.8% and provide a 1.63x speedup for CG on 8 NVIDIA A100 GPUs. The project code is available at https://github.com/ParCoreLab/CPU-Free-model. © 2023 Owner/Author(s).
dc.description.indexedbyScopus
dc.description.openaccessAll Open Access; Bronze Open Access
dc.description.publisherscopeInternational
dc.description.sponsoredbyTubitakEuN/A
dc.description.sponsorshipThis project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 949587).
dc.identifier.doi10.1145/3577193.3593713
dc.identifier.isbn979-840070056-9
dc.identifier.quartileN/A
dc.identifier.scopus2-s2.0-85168413592
dc.identifier.urihttps://doi.org/10.1145/3577193.3593713
dc.identifier.urihttps://hdl.handle.net/20.500.14288/26289
dc.keywordsGPU-initiated communication
dc.keywordsIterative solvers
dc.keywordsMulti-GPU
dc.keywordsNVSHMEM
dc.keywordsPersistent kernels
dc.language.isoeng
dc.publisherAssociation for Computing Machinery
dc.relation.grantnoHorizon 2020 Framework Programme, H2020, (949587); European Research Council, ERC
dc.relation.ispartofProceedings of the International Conference on Supercomputing
dc.subjectComputer science
dc.titleMulti-GPU communication schemes for iterative solvers: when CPUs are not in charge
dc.typeConference Proceeding
dspace.entity.typePublication
local.contributor.kuauthorErten, Didem Unat
local.contributor.kuauthorSağbili Doğan
local.contributor.kuauthorBaydamirli Javid
local.contributor.kuauthorIsmayilov, Ismayil
local.publication.orgunit1College of Engineering
local.publication.orgunit1GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
local.publication.orgunit2Department of Computer Engineering
local.publication.orgunit2Graduate School of Sciences and Engineering
relation.isOrgUnitOfPublication89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isOrgUnitOfPublication.latestForDiscovery89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isParentOrgUnitOfPublication8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublication434c9663-2b11-4e66-9399-c863e2ebae43
relation.isParentOrgUnitOfPublication.latestForDiscovery8e756b23-2d4a-4ce8-b1b3-62c794a8c164

Files