Publication:
Asynchronous AMR on multi-GPUs

dc.contributor.coauthorTan Nguyen
dc.contributor.coauthorZhang, Weiqun
dc.contributor.coauthorAlmgren, Ann S.
dc.contributor.coauthorShalf, John
dc.contributor.departmentN/A
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.kuauthorFarooqi, Muhammad Nufail
dc.contributor.kuauthorErten, Didem Unat
dc.contributor.kuprofilePhD Student
dc.contributor.kuprofileFaculty Member
dc.contributor.schoolcollegeinstituteGraduate School of Sciences and Engineering
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.yokidN/A
dc.contributor.yokid219274
dc.date.accessioned2024-11-09T23:04:48Z
dc.date.issued2020
dc.description.abstractAdaptive Mesh Refinement (AMR) is a computational and memory efficient technique for solving partial differential equations. As many of the supercomputers employ GPUs in their systems, AMR frameworks have to be evolved to adapt to large-scale heterogeneous systems. However, it is challenging to employ multiple GPUs and achieve good scalability in AMR because of its complex communication pattern. In this paper, we present our asynchronous AMR runtime system that simultaneously schedules tasks on both CPUs and GPUs and coordinates data movement between different processing units. Our runtime is adaptive to various machine configurations and uses a host resident data model. It helps facilitate using streams to overlap CPU-GPU data transfers with computation and increase device occupancy. We perform strong and weak scaling studies using an Advection solver on Piz Daint supercomputer and achieve high performance.
dc.description.indexedbyWoS
dc.description.indexedbyScopus
dc.description.openaccessNO
dc.description.sponsorshipSwiss National Supercomputing Centre (CSCS) [d87] This work was supported by a grant from the Swiss National Supercomputing Centre (CSCS) under project d87.
dc.description.volume11887
dc.identifier.doi10.1007/978-3-030-34356-9_11
dc.identifier.eissn1611-3349
dc.identifier.isbn978-3-030-34356-9
dc.identifier.isbn978-3-030-34355-2
dc.identifier.issn0302-9743
dc.identifier.scopus2-s2.0-85076831886
dc.identifier.urihttp://dx.doi.org/10.1007/978-3-030-34356-9_11
dc.identifier.urihttps://hdl.handle.net/20.500.14288/8690
dc.identifier.wos612971700011
dc.keywordsHeterogeneous execution
dc.keywordsAsynchronous runtime
dc.keywordsCommunication overlap
dc.languageEnglish
dc.publisherSpringer International Publishing Ag
dc.sourceHigh Performance Computing: Isc High Performance 2019 International Workshops
dc.subjectComputer science
dc.subjectTheory methods
dc.titleAsynchronous AMR on multi-GPUs
dc.typeConference proceeding
dspace.entity.typePublication
local.contributor.authorid0000-0002-1609-5847
local.contributor.authorid0000-0002-2351-0770
local.contributor.kuauthorFarooqi, Muhammad Nufail
local.contributor.kuauthorErten, Didem Unat
relation.isOrgUnitOfPublication89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication.latestForDiscovery89352e43-bf09-4ef4-82f6-6f9d0174ebae

Files