Researcher:
Tezcan, Erhan

Loading...
Profile Picture
ORCID

Job Title

Master Student

First Name

Erhan

Last Name

Tezcan

Name

Name Variants

Tezcan, Erhan

Email Address

Birth Date

Search Results

Now showing 1 - 3 of 3
  • Placeholder
    Publication
    ComScribe: identifying intra-node GPU communication
    (Springer Science and Business Media Deutschland GmbH, 2021) N/A; N/A; N/A; Department of Computer Engineering; Akhtar, Palwisha; Tezcan, Erhan; Qararyah, Fareed Mohammad; Erten, Didem Unat; Master Student; Master Student; PhD Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; N/A; N/A; 219274
    GPU communication plays a critical role in performance and scalability of multi-GPU accelerated applications. With the ever increasing methods and types of communication, it is often hard for the programmer to know the exact amount and type of communication taking place in an application. Though there are prior works that detect communication in distributed systems for MPI and multi-threaded applications on shared memory systems, to our knowledge, none of these works identify intra-node GPU communication. We propose a tool, ComScribe that identifies and categorizes types of communication among all GPU-GPU and CPU-GPU pairs in a node. Built on top of NVIDIA’s profiler nvprof, ComScribe visualizes data movement as a communication matrix or bar-chart for explicit communication primitives, Unified Memory operations, and Zero-copy Memory transfers. To validate our tool on 16 GPUs, we present communication patterns of 8 micro- and 3 macro-benchmarks from NVIDIA, Comm|Scope, and MGBench benchmark suites. To demonstrate tool’s capabilities in real-life applications, we also present insightful communication matrices of two deep neural network models. All in all, ComScribe can guide the programmer in identifying which groups of GPUs communicate in what volume by using which primitives. This offers avenues to detect performance bottlenecks and more importantly communication bugs in an application. © 2021, Springer Nature Switzerland AG.
  • Placeholder
    Publication
    Monitoring collective communication among GPUs
    (Springer International Publishing Ag, 2022) N/A; N/A; N/A; N/A; Department of Computer Engineering; Soytürk, Muhammet Abdullah; Akhtar, Palwisha; Tezcan, Erhan; Erten, Didem Unat; PhD Student; Master Student; Master Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; N/A; N/A; 219274
    Communication among devices in multi-GPU systems plays an important role in terms of performance and scalability. In order to optimize an application, programmers need to know the type and amount of the communication happening among GPUs. Although there are prior works to gather this information in MPI applications on distributed systems and multi-threaded applications on shared memory systems, there is no tool that identifies communication among GPUs. Our prior work, CoMSCRIBE, presents a point-to-point (P2P) communication detection tool for GPUs sharing a common host. In this work, we extend CoMSCRIBE to identify communication among GPUs for collective and P2P communication primitives in NVIDIA's NCCL library. In addition to P2P communications, collective communications are commonly used in HPC and AI workloads thus it is important to monitor the induced data movement due to collectives. Our tool extracts the size and the frequency of data transfers in an application and visualizes them as a communication matrix. To demonstrate the tool in action, we present communication matrices and some statistics for two applications coming from machine translation and image classification domains.
  • Placeholder
    Publication
    Mixed and multi-precision SpMV for GPUs with row-wise precision selection
    (IEEE Computer Society, 2022) Kaya, Kamer; N/A; N/A; N/A; Department of Computer Engineering; Tezcan, Erhan; Torun, Tuğba; Koşar, Fahrican; Erten, Didem Unat; Master Student; Researcher; Master Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; N/A; Graduate School of Sciences and Engineering; College of Engineering; N/A; N/A; N/A; 219274
    Sparse Matrix-Vector Multiplication (SpMV) is one of the key memory-bound kernels commonly used in industrial and scientific applications. To improve its data movement and benefit from higher compute rates, there are several efforts to utilize mixed precision on SpMV. Most of the prior-art focus on performing the entire SpMV in single-precision within a bigger context of an iterative solver (e.g., CG, GMRES). In this work, we are interested in a more fine-grained mixed-precision SpMV, where the level of precision is decided for each element in the matrix to be used in a single operation. We extend an existing entry-wise precision based approach by deciding precisions per row, motivated by the granularity of parallelism on a GPU where groups of threads process rows in CSR-based matrices. We propose mixed-precision CSR storage methods with row permutations and describe their greater efficiency and load-balancing compared to the existing method. We also consider a multi-precision case where single and double precision copies of the matrix are stored priorly and further extend our mixed-precision SpMV approach to comply with it. As such, we leverage a mixed-precision SpMV to obtain a multi-precision Jacobi method which is faster than yet almost as accurate as double-precision Jacobi implementation, and further evaluate a multi-precision Cardiac modeling algorithm. We demonstrate the effectiveness of the proposed SpMV methods on an extensive dataset of real-valued large sparse matrices from the SuiteSparse Matrix Collection using an NVIDIA V100 GPU.