Researcher:
Sasongko, Muhammad Aditya

Loading...
Profile Picture
ORCID

Job Title

Researcher

First Name

Muhammad Aditya

Last Name

Sasongko

Name

Name Variants

Sasongko, Muhammad Aditya

Email Address

Birth Date

Search Results

Now showing 1 - 4 of 4
  • Placeholder
    Publication
    BindMe: a thread binding library with advanced mapping algorithms
    (Wiley, 2018) N/A; Department of Computer Engineering; Department of Computer Engineering; Soomro, Pirah Noor; Sasongko, Muhammad Aditya; Erten, Didem Unat; PhD Student; Researcher; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; N/A; N/A; 219274
    Binding parallel tasks to cores according to a placement policy is one of the key aspects to achieve good performance in multicore machines because it can reduce on-chip communication among parallel threads. Binding also prevents operating system from migrating threads, which improves data locality. However, there is no single mapping policy that works best among all different kinds of applications and platforms because each machine has a different topology and each application exhibits different communication pattern. Determining the best policy for a given application and machine requires extra programming effort. To relieve the programmer from that burden, we introduce BindMe, A thread binding library that assists programmer to bind threads to underlying hardware. BindMe incorporates state-of-the-art mapping algorithms, which use communication pattern in an application to formulate an efficient task placement policy. We also introduce ChoiceMap, A communication aware mapping algorithm that respects mutual priorities of parallel tasks and performs a fair mapping by reducing communication volume among cores. We have tested BindMe and ChoiceMap with various applications from NaS parallel benchmark and Rodinia bechmark. Our results show that choosing a mapping policy that best suits the application behavior can increase its performance and no single policy gives the best performance across different applications.
  • Placeholder
    Publication
    Reusetracker: fast yet accurate multicore reuse distance analyzer
    (Assoc Computing Machinery, 2022) Chabbi, Milind; Department of Computer Engineering; N/A; Department of Computer Engineering; Sasongko, Muhammad Aditya; Marzijarani, Mandana Bagheri; Erten, Didem Unat; Researcher; Master Student; Faculty Member; Department of Computer Engineering; College of Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; N/A; 219274
    One widely used metric that measures data locality is reuse distance-the number of unique memory locations that are accessed between two consecutive accesses to a particular memory location. State-of-the-art techniques that measure reuse distance in parallel applications rely on simulators or binary instrumentation tools that incur large performance and memory overheads. Moreover, the existing sampling-based tools are limited to measuring reuse distances of a single thread and discard interactions among threads in multi-threaded programs. In this work, we propose REUSETRACKER a fast and accurate reuse distance analyzer that lever-ages existing hardware features in commodity CPUs. REUSETRACKER is designed for multi-threaded programs and takes cache-coherence effects into account. By utilizing hardware features like performance monitoring units and debug registers, REUSETRACKER can accurately profile reuse distance in parallel applications with much lower overheads than existing tools. It introduces only 2.9x runtime and 2.8x memory overheads. Our tool achieves 92% accuracy when verified against a newly developed configurable benchmark that can generate a variety of different reuse distance patterns. We demonstrate the tool's functionality with two use-case scenarios using PARSEC, Rodinia, and Synchrobench benchmark suites where REUSETRACKER guides code refactoring in these benchmarks by detecting spatial reuses in shared caches that are also false sharing and successfully predicts whether some benchmarks in these suites can benefit from adjacent cache line prefetch optimization.
  • Placeholder
    Publication
    Low-overhead reuse distance profiling tool for multicore
    (Springer International Publishing Ag, 2022) Chabbi, Milind; Department of Computer Engineering; Department of Computer Engineering; Sasongko, Muhammad Aditya; Erten, Didem Unat; Researcher; Faculty Member; Department of Computer Engineering; College of Engineering; College of Engineering; N/A; 219274
    With the increase in core count in multicore systems, data movement is one of the main sources of performance slowdown in parallel applications and data locality has become a critical factor in application optimization. One of the important locality metrics is reuse distance, which shows the likelihood of a memory access to be a cache hit. In this work, we propose a low-overhead reuse distance profiling tool for multi-threaded applications. Our method relies on available hardware features in commodity CPUs, namely, Performance Monitoring Units (PMUs) and debug registers, to detect data reuse in private and shared caches by considering inter-thread cache line invalidations. Unlike prior approaches, our tool is fast, accurate, does not change the program behavior and can also handle shared cache accesses. Though it has low runtime (2.9x) and memory overheads (2.8x), our tool achieves 92% accuracy.
  • Thumbnail Image
    PublicationOpen Access
    ComDetective: a lightweight communication detection tool for threads
    (Association for Computing Machinery (ACM), 2019) Chabbi, Milind; N/A; Department of Computer Engineering; Sasongko, Muhammad Aditya; Akhtar, Palwisha; Erten, Didem Unat; Researcher; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; N/A; 219274
    Inter-thread communication is a vital performance indicator in shared-memory systems. Prior works on identifying inter-thread communication employed hardware simulators or binary instrumentation and suffered from inaccuracy or high overheads-both space and time-making them impractical for production use. We propose COMDETECTIVE, which produces communication matrices that are accurate and introduces low runtime and low memory overheads, thus making it practical for production use. COMDETECTIVE employs hardware performance counters to sample memory-access events and uses hardware debug registers to sample communicating pairs of threads. COMDETECTIVE can differentiate communication as true or false sharing between threads. Its runtime and memory overheads are only 1.30x and 1.27x, respectively, for the 18 applications studied under 500K sampling period. Using COMDETECTIVE, we produce insightful communication matrices for microbenchmarks, PARSEC benchmark suite, and several CORAL applications and compare the generated matrices against MPI counterparts. Guided by COMDETECTIVE, we optimize a few codes and achieve up to 13% speedup.