Publication:
Snoopie: a multi-GPU communication profiler and visualizer

Placeholder

Organizational Units

Program

KU Authors

Co-Authors

Advisor

Publication Date

2024

Language

en

Type

Conference proceeding

Journal Title

Journal ISSN

Volume Title

Abstract

With data movement becoming one of the most expensive bottlenecks in computing, the need for profiling tools to analyze communication becomes crucial for effectively scaling multi-GPU applications. While existing profiling tools including first-party software by GPU vendors are robust and excel at capturing compute operations within a single GPU, support for monitoring GPU-GPU data transfers and calls issued by communication libraries is currently inadequate. To fill these gaps, we introduce Snoopie, an instrumentation-based multi-GPU communication profiling tool built on NVBit, capable of tracking peer-to-peer transfers and GPU-centric communication library calls. To increase programmer productivity, Snoopie can attribute data movement to the source code line and the data objects involved. It comes with multiple visualization modes at varying granularities, from a coarse view of the data movement in the system as a whole to specific instructions and addresses. Our case studies demonstrate Snoopie's effectiveness in monitoring data movement, locating performance bugs in applications, and understanding concrete data transfers abstracted beneath communication libraries. The tool is publicly available at https://github.com/ParCoreLab/snoopie.

Description

Source:

Proceedings of the 38th ACM International Conference on Supercomputing, ACM ICS 2024

Publisher:

Assoc Computing Machinery

Keywords:

Subject

Computer science, Artificial intelligence, Hardware and architecture, Theory and methods

Citation

Endorsement

Review

Supplemented By

Referenced By

Copy Rights Note

0

Views

0

Downloads

View PlumX Details