Publication: A Device-Side Execution Model for Multi-GPU Task Graphs
Program
KU-Authors
KU Authors
Co-Authors
Turimbetov, Ilyas (57211567866)
Wahib, Mohamed (60172528700)
Unat, Didem (27868216500)
Publication Date
Language
Embargo Status
No
Journal Title
Journal ISSN
Volume Title
Alternative Title
Abstract
Executing task graphs on multi-GPU systems presents challenges typically managed by CPU-side runtimes, which handle memory management, track dependencies, and balance load. However, the interplay of runtime components, CPU-driven kernel initialization, and dynamic task graph construction creates significant overhead. For static graphs, recent advancements have enabled GPU-side execution, demonstrating substantial performance gains in single-GPU scenarios. However, multi-GPU execution still lags behind in both usability and performance. In particular, no GPU-side solution exists for executing task graphs on multiple nodes.In this work, we introduce Mustard, a multi-GPU execution model that shifts execution of static task graphs entirely to the devices, drastically reducing overhead. Mustard offers a clean solution for executing CUDA graphs across multiple GPUs on multiple nodes without requiring modifications to GPU kernel code or the adoption of new runtime mechanisms or APIs. By transforming the task graph, Mustard enables precise tracking of task dependencies and load balancing directly on the GPU, eliminating the need for host CPU involvement. We evaluate our approach using generated graphs, as well as LU and Cholesky decomposition graphs. In a multi-node scenario with 64 GPUs, Mustard achieves an average 5.83× speedup over the linear algebra library SLATE. On a single node, compared to the best-performing baseline, Mustard delivers an average 1.66× speedup for LU and 1.29× for Cholesky. © 2025 Copyright held by the owner/author(s).
Source
Publisher
Association for Computing Machinery
Subject
Citation
Has Part
Source
39th ACM International Conference on Supercomputing, ICS 2025
Book Series Title
Edition
DOI
10.1145/3721145.3730426
item.page.datauri
Link
Rights
Copyrighted
