Publication:
Snoopie: a multi-GPU communication profiler and visualizer

Thumbnail Image

School / College / Institute

Organizational Unit

Program

KU Authors

Co-Authors

Publication Date

Language

Embargo Status

Journal Title

Journal ISSN

Volume Title

Alternative Title

Abstract

With data movement becoming one of the most expensive bottlenecks in computing, the need for profiling tools to analyze communication becomes crucial for effectively scaling multi-GPU applications. While existing profiling tools including first-party software by GPU vendors are robust and excel at capturing compute operations within a single GPU, support for monitoring GPU-GPU data transfers and calls issued by communication libraries is currently inadequate. To fill these gaps, we introduce Snoopie, an instrumentation-based multi-GPU communication profiling tool built on NVBit, capable of tracking peer-to-peer transfers and GPU-centric communication library calls. To increase programmer productivity, Snoopie can attribute data movement to the source code line and the data objects involved. It comes with multiple visualization modes at varying granularities, from a coarse view of the data movement in the system as a whole to specific instructions and addresses. Our case studies demonstrate Snoopie's effectiveness in monitoring data movement, locating performance bugs in applications, and understanding concrete data transfers abstracted beneath communication libraries. The tool is publicly available at https://github.com/ParCoreLab/snoopie.

Source

Publisher

Assoc Computing Machinery

Subject

Computer science, Artificial intelligence, Hardware and architecture, Theory and methods

Citation

Has Part

Source

Proceedings of the 38th ACM International Conference on Supercomputing, ACM ICS 2024

Book Series Title

Edition

DOI

10.1145/3650200.3656597

item.page.datauri

Link

Rights

Copyrights Note

Endorsement

Review

Supplemented By

Referenced By

4

Views

2

Downloads

View PlumX Details