Asynchronous AMR on multi-GPUs

Publication:
Asynchronous AMR on multi-GPUs

dc.contributor.coauthor	Tan Nguyen
dc.contributor.coauthor	Zhang, Weiqun
dc.contributor.coauthor	Almgren, Ann S.
dc.contributor.coauthor	Shalf, John
dc.contributor.department	Department of Computer Engineering
dc.contributor.department	Graduate School of Sciences and Engineering
dc.contributor.kuauthor	Erten, Didem Unat
dc.contributor.kuauthor	Farooqi, Muhammad Nufail
dc.contributor.schoolcollegeinstitute	College of Engineering
dc.contributor.schoolcollegeinstitute	GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
dc.date.accessioned	2024-11-09T23:04:48Z
dc.date.issued	2020
dc.description.abstract	Adaptive Mesh Refinement (AMR) is a computational and memory efficient technique for solving partial differential equations. As many of the supercomputers employ GPUs in their systems, AMR frameworks have to be evolved to adapt to large-scale heterogeneous systems. However, it is challenging to employ multiple GPUs and achieve good scalability in AMR because of its complex communication pattern. In this paper, we present our asynchronous AMR runtime system that simultaneously schedules tasks on both CPUs and GPUs and coordinates data movement between different processing units. Our runtime is adaptive to various machine configurations and uses a host resident data model. It helps facilitate using streams to overlap CPU-GPU data transfers with computation and increase device occupancy. We perform strong and weak scaling studies using an Advection solver on Piz Daint supercomputer and achieve high performance.
dc.description.indexedby	WOS
dc.description.indexedby	Scopus
dc.description.openaccess	NO
dc.description.sponsoredbyTubitakEu	N/A
dc.description.sponsorship	Swiss National Supercomputing Centre (CSCS) [d87] This work was supported by a grant from the Swiss National Supercomputing Centre (CSCS) under project d87.
dc.description.volume	11887
dc.identifier.doi	10.1007/978-3-030-34356-9_11
dc.identifier.eissn	1611-3349
dc.identifier.isbn	978-3-030-34356-9
dc.identifier.isbn	978-3-030-34355-2
dc.identifier.issn	0302-9743
dc.identifier.scopus	2-s2.0-85076831886
dc.identifier.uri	https://doi.org/10.1007/978-3-030-34356-9_11
dc.identifier.uri	https://hdl.handle.net/20.500.14288/8690
dc.identifier.wos	612971700011
dc.keywords	Heterogeneous execution
dc.keywords	Asynchronous runtime
dc.keywords	Communication overlap
dc.language.iso	eng
dc.publisher	Springer International Publishing Ag
dc.relation.ispartof	High Performance Computing: Isc High Performance 2019 International Workshops
dc.subject	Computer science
dc.subject	Theory methods
dc.title	Asynchronous AMR on multi-GPUs
dc.type	Conference Proceeding
dspace.entity.type	Publication
local.contributor.kuauthor	Farooqi, Muhammad Nufail
local.contributor.kuauthor	Erten, Didem Unat
local.publication.orgunit1	GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
local.publication.orgunit1	College of Engineering
local.publication.orgunit2	Department of Computer Engineering
local.publication.orgunit2	Graduate School of Sciences and Engineering
relation.isOrgUnitOfPublication	89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication	3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isOrgUnitOfPublication.latestForDiscovery	89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isParentOrgUnitOfPublication	8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublication	434c9663-2b11-4e66-9399-c863e2ebae43
relation.isParentOrgUnitOfPublication.latestForDiscovery	8e756b23-2d4a-4ce8-b1b3-62c794a8c164

Collections

Publications without Fulltext

Publication: Asynchronous AMR on multi-GPUs

Files

Collections

Publication:
Asynchronous AMR on multi-GPUs