Publication: Snoopie: a multi-GPU communication profiler and visualizer
Program
KU Authors
Co-Authors
Advisor
Publication Date
2024
Language
en
Type
Conference proceeding
Journal Title
Journal ISSN
Volume Title
Abstract
With data movement becoming one of the most expensive bottlenecks in computing, the need for profiling tools to analyze communication becomes crucial for effectively scaling multi-GPU applications. While existing profiling tools including first-party software by GPU vendors are robust and excel at capturing compute operations within a single GPU, support for monitoring GPU-GPU data transfers and calls issued by communication libraries is currently inadequate. To fill these gaps, we introduce Snoopie, an instrumentation-based multi-GPU communication profiling tool built on NVBit, capable of tracking peer-to-peer transfers and GPU-centric communication library calls. To increase programmer productivity, Snoopie can attribute data movement to the source code line and the data objects involved. It comes with multiple visualization modes at varying granularities, from a coarse view of the data movement in the system as a whole to specific instructions and addresses. Our case studies demonstrate Snoopie's effectiveness in monitoring data movement, locating performance bugs in applications, and understanding concrete data transfers abstracted beneath communication libraries. The tool is publicly available at https://github.com/ParCoreLab/snoopie.
Description
Source:
Proceedings of the 38th ACM International Conference on Supercomputing, ACM ICS 2024
Publisher:
Assoc Computing Machinery
Keywords:
Subject
Computer science, Artificial intelligence, Hardware and architecture, Theory and methods