Publication:
A Device-Side Execution Model for Multi-GPU Task Graphs

Placeholder

Departments

School / College / Institute

Program

KU-Authors

KU Authors

Co-Authors

Turimbetov, Ilyas (57211567866)
Wahib, Mohamed (60172528700)
Unat, Didem (27868216500)

Publication Date

Language

Embargo Status

No

Journal Title

Journal ISSN

Volume Title

Alternative Title

Abstract

Executing task graphs on multi-GPU systems presents challenges typically managed by CPU-side runtimes, which handle memory management, track dependencies, and balance load. However, the interplay of runtime components, CPU-driven kernel initialization, and dynamic task graph construction creates significant overhead. For static graphs, recent advancements have enabled GPU-side execution, demonstrating substantial performance gains in single-GPU scenarios. However, multi-GPU execution still lags behind in both usability and performance. In particular, no GPU-side solution exists for executing task graphs on multiple nodes.In this work, we introduce Mustard, a multi-GPU execution model that shifts execution of static task graphs entirely to the devices, drastically reducing overhead. Mustard offers a clean solution for executing CUDA graphs across multiple GPUs on multiple nodes without requiring modifications to GPU kernel code or the adoption of new runtime mechanisms or APIs. By transforming the task graph, Mustard enables precise tracking of task dependencies and load balancing directly on the GPU, eliminating the need for host CPU involvement. We evaluate our approach using generated graphs, as well as LU and Cholesky decomposition graphs. In a multi-node scenario with 64 GPUs, Mustard achieves an average 5.83× speedup over the linear algebra library SLATE. On a single node, compared to the best-performing baseline, Mustard delivers an average 1.66× speedup for LU and 1.29× for Cholesky. © 2025 Copyright held by the owner/author(s).

Source

Publisher

Association for Computing Machinery

Subject

Citation

Has Part

Source

39th ACM International Conference on Supercomputing, ICS 2025

Book Series Title

Edition

DOI

10.1145/3721145.3730426

item.page.datauri

Link

Rights

Copyrighted

Copyrights Note

Endorsement

Review

Supplemented By

Referenced By

0

Views

0

Downloads

View PlumX Details