Researcher:
Bastem, Burak

Loading...
Profile Picture
ORCID

Job Title

Master Student

First Name

Burak

Last Name

Bastem

Name

Name Variants

Bastem, Burak

Email Address

Birth Date

Search Results

Now showing 1 - 3 of 3
  • Placeholder
    Publication
    TiDA: High-level programming abstractions for data locality management
    (Springer International Publishing Ag, 2016) Nguyen, Tan; Zhang, Weiqun; Michelogiannakis, George; Almgren, Ann; Shalf,John; N/A; Department of Computer Engineering; N/A; Farooqi, Muhammad Nufail; Erten, Didem Unat; Bastem, Burak; PhD Student; Faculty Member; Master Student; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; Graduate School of Sciences and Engineering; N/A; 219275; N/A
    The high energy costs for data movement compared to computation gives paramount importance to data locality management in programs. Managing data locality manually is not a trivial task and also complicates programming. Tiling is a well-known approach that provides both data locality and parallelism in an application. However, there is no standard programming construct to express tiling at the application level. We have developed a multicore programming model, TiDA, based on tiling and implemented the model as C++ and Fortran libraries. The proposed programming model has three high level abstractions, tiles, regions and tile iterator. These abstractions in the library hide the details of data decomposition, cache locality optimizations, and memory affinity management in the application. In this paper we unveil the internals of the library and demonstrate the performance and programability advantages of the model on five applications on multiple NUMA nodes. The library achieves up to 2.10x speedup over OpenMP in a single compute node for simple kernels, and up to 22x improvement over a single thread for a more complex combustion proxy application (SMC) on 24 cores. The MPI+TiDA implementation of geometric multigrid demonstrates a 30.9% performance improvement over MPI+OpenMP when scaling to 3072 cores (excluding MPI communication overheads, 8.5% otherwise).
  • Placeholder
    Publication
    Tiling-based programming model for structured grids on GPU clusters
    (Assoc Computing Machinery, 2020) N/A; Department of Computer Engineering; Bastem, Burak; Erten, Didem Unat; Master Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; N/A; 219274
    Currently, more than 25% of supercomputers employ GPUs due to their massively parallel and power-efficient architectures. However, programming GPUs effiently in a large scale system is a demanding task not only for computational scientists but also for programming experts as multi-GPU programming requires managing distinct address spaces, generating GPU-specific code and handling inter-device communication. To ease the programming effort, we propose a tiling-based high-level GPU programming model for structured grid problems. The model abstracts data decomposition, memory management and generation of GPU specific code, and hides all types of data transfer overheads. We demonstrate the effectiveness of the programming model on a heat simulation and a real-life cardiac modeling on a single GPU, on a single node with multiple-GPUs and multiple-nodes with multiple-GPUs. We also present performance comparisons under different hardware and software configurations. The results show that the programming model successfully overlaps communication and provides good speedup on 192 GPUs.
  • Thumbnail Image
    PublicationOpen Access
    Overlapping data transfers with computation on GPU with tiles
    (Institute of Electrical and Electronics Engineers (IEEE), 2017) Zhang, Weiqun; Almgren, Ann; Shalf, John; Department of Computer Engineering; Bastem, Burak; Erten, Didem Unat; Master Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; N/A; 219274
    GPUs are employed to accelerate scientific applications however they require much more programming effort from the programmers particularly because of the disjoint address spaces between the host and the device. OpenACC and OpenMP 4.0 provide directive based programming solutions to alleviate the programming burden however synchronous data movement can create a performance bottleneck in fully taking advantage of GPUs. We propose a tiling based programming model and its library that simplifies the development of GPU programs and overlaps the data movement with computation. The programming model decomposes the data and computation into tiles and treats them as the main data transfer and execution units, which enables pipelining the transfers to hide the transfer latency. Moreover, partitioning application data into tiles allows the programmer to still take advantage of GPU even though application data cannot fit into the device memory. The library leverages C++ lambda functions, OpenACC directives, CUDA streams and tiling API from TiDA to support both productivity and performance. We show the performance of the library on a data transfer-intensive and a compute-intensive kernels and compare its speedup against OpenACC and CUDA. The results indicate that the library can hide the transfer latency, handle the cases where there is no sufficient device memory, and achieves reasonable performance.