Publication:
Overlapping data transfers with computation on GPU with tiles

dc.contributor.coauthorZhang, Weiqun
dc.contributor.coauthorAlmgren, Ann
dc.contributor.coauthorShalf, John
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.kuauthorBastem, Burak
dc.contributor.kuauthorErten, Didem Unat
dc.contributor.kuprofileMaster Student
dc.contributor.kuprofileFaculty Member
dc.contributor.otherDepartment of Computer Engineering
dc.contributor.schoolcollegeinstituteGraduate School of Sciences and Engineering
dc.contributor.yokidN/A
dc.contributor.yokid219274
dc.date.accessioned2024-11-09T13:44:41Z
dc.date.issued2017
dc.description.abstractGPUs are employed to accelerate scientific applications however they require much more programming effort from the programmers particularly because of the disjoint address spaces between the host and the device. OpenACC and OpenMP 4.0 provide directive based programming solutions to alleviate the programming burden however synchronous data movement can create a performance bottleneck in fully taking advantage of GPUs. We propose a tiling based programming model and its library that simplifies the development of GPU programs and overlaps the data movement with computation. The programming model decomposes the data and computation into tiles and treats them as the main data transfer and execution units, which enables pipelining the transfers to hide the transfer latency. Moreover, partitioning application data into tiles allows the programmer to still take advantage of GPU even though application data cannot fit into the device memory. The library leverages C++ lambda functions, OpenACC directives, CUDA streams and tiling API from TiDA to support both productivity and performance. We show the performance of the library on a data transfer-intensive and a compute-intensive kernels and compare its speedup against OpenACC and CUDA. The results indicate that the library can hide the transfer latency, handle the cases where there is no sufficient device memory, and achieves reasonable performance.
dc.description.fulltextYES
dc.description.indexedbyWoS
dc.description.indexedbyScopus
dc.description.openaccessYES
dc.description.publisherscopeInternational
dc.description.sponsoredbyTubitakEuTÜBİTAK
dc.description.sponsoredbyTubitakEuEU
dc.description.sponsorshipOffice of Advanced Scientific Computing Research in the Department of Energy Office of Science
dc.description.sponsorshipMarie Sklodowska Curie Reintegration Grant by the European Commission
dc.description.sponsorshipScientific and Technological Research Council of Turkey (TÜBİTAK)
dc.description.sponsorshipEuropean Union (EU)
dc.description.sponsorshipHorizon 2020
dc.description.versionAuthor's final manuscript
dc.formatpdf
dc.identifier.doi10.1109/ICPP.2017.26
dc.identifier.eissn1572-9303
dc.identifier.embargoNO
dc.identifier.filenameinventorynoIR01374
dc.identifier.issn1382-4090
dc.identifier.linkhttps://doi.org/10.1109/ICPP.2017.26
dc.identifier.quartileN/A
dc.identifier.scopus2-s2.0-85030653621
dc.identifier.urihttps://hdl.handle.net/20.500.14288/3523
dc.identifier.wos426952300018
dc.languageEnglish
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.relation.grantnoDE-AC02-05CH11231
dc.relation.grantno215E185
dc.relation.grantno655965
dc.relation.grantnoAC02-05CH11231
dc.relation.urihttp://cdm21054.contentdm.oclc.org/cdm/ref/collection/IR/id/7859
dc.sourceProceedings of the International Conference on Parallel Processing
dc.subjectComputer science
dc.titleOverlapping data transfers with computation on GPU with tiles
dc.typeConference proceeding
dspace.entity.typePublication
local.contributor.authoridN/A
local.contributor.authorid0000-0002-2351-0770
local.contributor.kuauthorBastem, Burak
local.contributor.kuauthorErten, Didem Unat
relation.isOrgUnitOfPublication89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication.latestForDiscovery89352e43-bf09-4ef4-82f6-6f9d0174ebae

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
7859.pdf
Size:
519.15 KB
Format:
Adobe Portable Document Format