Publication:
A split execution model for SpTRSV

dc.contributor.coauthorYilmaz, Buse
dc.contributor.departmentN/A
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.kuauthorAhmad, Najeeb
dc.contributor.kuauthorErten, Didem Unat
dc.contributor.kuprofilePhD Student
dc.contributor.kuprofileFaculty Member
dc.contributor.otherDepartment of Computer Engineering
dc.contributor.schoolcollegeinstituteGraduate School of Sciences and Engineering
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.yokidN/A
dc.contributor.yokid219274
dc.date.accessioned2024-11-09T23:47:52Z
dc.date.issued2021
dc.description.abstractSparse Triangular Solve (SpTRSV) is an important and extensively used kernel in scientific computing. Parallelism within SpTRSV depends upon matrix sparsity pattern and, in many cases, is non-uniform from one computational step to the next. In cases where the SpTRSV computational steps have contrasting parallelism characteristics- some steps are more parallel, others more sequential in nature, the performance of an SpTRSV algorithm may be limited by the contrasting parallelism characteristics. In this work, we propose a split-execution model for SpTRSV to automatically divide SpTRSV computation into two sub-SpTRSV systems and an SpMV, such that one of the sub-SpTRSVs has more parallelism than the other. Each sub-SpTRSV is then computed using different SpTRSV algorithms, which are possibly executed on different platforms (CPU or GPU). By analyzing the SpTRSV Directed Acyclic Graph (DAG) and matrix sparsity features, we use a heuristics-based approach to (i) automatically determine the suitability of an SpTRSV for split-execution, (ii) find the appropriate split-point, and (iii) execute SpTRSV in a split fashion using two SpTRSV algorithms while managing any required inter-platform communication. Experimental evaluation of the execution model on two CPU-GPU machines with a matrix dataset of 327 matrices from the SuiteSparse Matrix Collection shows that our approach correctly selects the fastest SpTRSV method (split or unsplit) for 88 percent of matrices on the Intel Xeon Gold (6148) + NVIDIA Tesla V100 and 83 percent on the Intel Core I7 + NVIDIA G1080 Ti platform achieving speedups up to 10x and 6.36x respectively.
dc.description.indexedbyWoS
dc.description.indexedbyScopus
dc.description.issue11
dc.description.openaccessNO
dc.description.publisherscopeInternational
dc.description.sponsorshipAramco Overseas Company
dc.description.sponsorshipSaudi Aramco The authors would like to thank Aramco Overseas Company and Saudi Aramco for funding this work.
dc.description.volume32
dc.identifier.doi10.1109/TPDS.2021.3074501
dc.identifier.eissn1558-2183
dc.identifier.issn1045-9219
dc.identifier.quartileQ1
dc.identifier.scopus2-s2.0-85104653673
dc.identifier.urihttp://dx.doi.org/10.1109/TPDS.2021.3074501
dc.identifier.urihttps://hdl.handle.net/20.500.14288/14188
dc.identifier.wos655244100005
dc.keywordsSparse matrices
dc.keywordsParallel algorithms
dc.keywordsComputational modeling
dc.keywordsKernel
dc.keywordsGraphics processing units
dc.keywordsFats
dc.keywordsPhased arrays
dc.keywordsSparse triangular solve
dc.keywordsCPU-GPU computing
dc.keywordsHeterogeneous computing
dc.keywordsSparse linear systems
dc.keywordsSpTRSV
dc.keywordsSpTS
dc.languageEnglish
dc.publisherIEEE Computer Society
dc.sourceIEEE Transactions on Parallel and Distributed Systems
dc.subjectComputer science, theory and methods
dc.subjectEngineering, electrical and electronic
dc.titleA split execution model for SpTRSV
dc.typeJournal Article
dspace.entity.typePublication
local.contributor.authorid0000-0002-3460-1256
local.contributor.authorid0000-0002-2351-0770
local.contributor.kuauthorAhmad, Najeeb
local.contributor.kuauthorErten, Didem Unat
relation.isOrgUnitOfPublication89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication.latestForDiscovery89352e43-bf09-4ef4-82f6-6f9d0174ebae

Files