Publication: Asynchronous AMR on multi-GPUs
dc.contributor.coauthor | Tan Nguyen | |
dc.contributor.coauthor | Zhang, Weiqun | |
dc.contributor.coauthor | Almgren, Ann S. | |
dc.contributor.coauthor | Shalf, John | |
dc.contributor.department | N/A | |
dc.contributor.department | Department of Computer Engineering | |
dc.contributor.department | Department of Computer Engineering | |
dc.contributor.kuauthor | Farooqi, Muhammad Nufail | |
dc.contributor.kuauthor | Erten, Didem Unat | |
dc.contributor.kuprofile | PhD Student | |
dc.contributor.kuprofile | Faculty Member | |
dc.contributor.schoolcollegeinstitute | Graduate School of Sciences and Engineering | |
dc.contributor.schoolcollegeinstitute | College of Engineering | |
dc.contributor.yokid | N/A | |
dc.contributor.yokid | 219274 | |
dc.date.accessioned | 2024-11-09T23:04:48Z | |
dc.date.issued | 2020 | |
dc.description.abstract | Adaptive Mesh Refinement (AMR) is a computational and memory efficient technique for solving partial differential equations. As many of the supercomputers employ GPUs in their systems, AMR frameworks have to be evolved to adapt to large-scale heterogeneous systems. However, it is challenging to employ multiple GPUs and achieve good scalability in AMR because of its complex communication pattern. In this paper, we present our asynchronous AMR runtime system that simultaneously schedules tasks on both CPUs and GPUs and coordinates data movement between different processing units. Our runtime is adaptive to various machine configurations and uses a host resident data model. It helps facilitate using streams to overlap CPU-GPU data transfers with computation and increase device occupancy. We perform strong and weak scaling studies using an Advection solver on Piz Daint supercomputer and achieve high performance. | |
dc.description.indexedby | WoS | |
dc.description.indexedby | Scopus | |
dc.description.openaccess | NO | |
dc.description.sponsorship | Swiss National Supercomputing Centre (CSCS) [d87] This work was supported by a grant from the Swiss National Supercomputing Centre (CSCS) under project d87. | |
dc.description.volume | 11887 | |
dc.identifier.doi | 10.1007/978-3-030-34356-9_11 | |
dc.identifier.eissn | 1611-3349 | |
dc.identifier.isbn | 978-3-030-34356-9 | |
dc.identifier.isbn | 978-3-030-34355-2 | |
dc.identifier.issn | 0302-9743 | |
dc.identifier.scopus | 2-s2.0-85076831886 | |
dc.identifier.uri | http://dx.doi.org/10.1007/978-3-030-34356-9_11 | |
dc.identifier.uri | https://hdl.handle.net/20.500.14288/8690 | |
dc.identifier.wos | 612971700011 | |
dc.keywords | Heterogeneous execution | |
dc.keywords | Asynchronous runtime | |
dc.keywords | Communication overlap | |
dc.language | English | |
dc.publisher | Springer International Publishing Ag | |
dc.source | High Performance Computing: Isc High Performance 2019 International Workshops | |
dc.subject | Computer science | |
dc.subject | Theory methods | |
dc.title | Asynchronous AMR on multi-GPUs | |
dc.type | Conference proceeding | |
dspace.entity.type | Publication | |
local.contributor.authorid | 0000-0002-1609-5847 | |
local.contributor.authorid | 0000-0002-2351-0770 | |
local.contributor.kuauthor | Farooqi, Muhammad Nufail | |
local.contributor.kuauthor | Erten, Didem Unat | |
relation.isOrgUnitOfPublication | 89352e43-bf09-4ef4-82f6-6f9d0174ebae | |
relation.isOrgUnitOfPublication.latestForDiscovery | 89352e43-bf09-4ef4-82f6-6f9d0174ebae |