Publication:
Precise event sampling-based data locality tools for AMD multicore architectures

dc.contributor.coauthorChabbi, Milind
dc.contributor.coauthorKelly, Paul H.
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.kuauthorSasongko, Muhammad Aditya
dc.contributor.kuauthorErten, Didem Unat
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.date.accessioned2025-01-19T10:34:10Z
dc.date.issued2023
dc.description.abstractWe propose ComDetective+, an inter-thread communication analyzer, and ReuseTracker+, a reuse distance analyzer, that leverage the hardware features in AMD processors to support low-overhead profiling. Both tools employ the instruction-based sampling (IBS) facility and debug registers in AMD processors to detect inter-thread communication and data reuse. Different from prior arts, ComDetective+ differentiates the communication into true and false sharing, and ReuseTracker+ measures reuse distance in private and shared caches by also considering cache line invalidation with low overhead. Both tools can attribute the communications and reuses to source code lines. To our knowledge these tools are two of the few profiling tools designed specifically for AMD x86 architectures using IBS. Our tools are timely and relevant considering the rise in numbers of AMD processor based data centers and HPC systems. We perform experiments to evaluate the accuracy and overheads of the proposed tools on an AMD machine with two-socket EPYC 7352 processors. ComDetective+ exhibits high accuracy while introducing 5.14xruntime and 1.4x memory overheads. ReuseTracker+ also displays high accuracy, which is 95%, with 11.76x runtime and 1.46x memory overheads. These overheads are much lower than the overheads of existing simulators and code instrumentation-based tools. Lastly, we demonstrate the usage of the tools by having COMDETECTIVE+ and REUSETRACKER+ facilitate the code refactoring of two data mining benchmarks to improve their performance by up to 29%.
dc.description.indexedbyWOS
dc.description.indexedbyScopus
dc.description.issue24
dc.description.publisherscopeInternational
dc.description.sponsoredbyTubitakEuN/A
dc.description.sponsorshipEuropean High-Performance Computing Joint Undertaking, Grant/Award Number: 956213; Royal Society-Newton Advanced Fellowship, Grant/Award Number: NAF\R2\202207; Turkish Science and Technology Research Centre, Grant/Award Number: 120N003
dc.description.volume35
dc.identifier.doi10.1002/cpe.7707
dc.identifier.eissn1532-0634
dc.identifier.issn1532-0626
dc.identifier.quartileQ2
dc.identifier.scopus2-s2.0-85151974180
dc.identifier.urihttps://doi.org/10.1002/cpe.7707
dc.identifier.urihttps://hdl.handle.net/20.500.14288/26746
dc.identifier.wos962646800001
dc.keywordsDebug registers
dc.keywordsHardware performance counters
dc.keywordsMeasurement
dc.keywordsMulticore architectures
dc.keywordsPerformance
dc.keywordsPrecise event sampling
dc.language.isoeng
dc.publisherWiley
dc.relation.grantnoEuropean High-Performance Computing Joint Undertaking [956213]; Royal Society-Newton Advanced Fellowship [NAF\R2\202207]; Turkish Science and Technology Research Centre [120N003]
dc.relation.ispartofConcurrency and Computation-Practice & Experience
dc.subjectComputer Science
dc.titlePrecise event sampling-based data locality tools for AMD multicore architectures
dc.typeJournal Article
dspace.entity.typePublication
local.contributor.kuauthorSasongko, Muhammad Aditya
local.contributor.kuauthorErten, Didem Unat
local.publication.orgunit1College of Engineering
local.publication.orgunit2Department of Computer Engineering
relation.isOrgUnitOfPublication89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication.latestForDiscovery89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isParentOrgUnitOfPublication8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublication.latestForDiscovery8e756b23-2d4a-4ce8-b1b3-62c794a8c164

Files