Fast multidimensional reduction and broadcast operations on GPU for machine learning

Publication:
Fast multidimensional reduction and broadcast operations on GPU for machine learning

dc.contributor.department	Department of Computer Engineering
dc.contributor.department	Graduate School of Sciences and Engineering
dc.contributor.kuauthor	Çoban, Enis Berk
dc.contributor.kuauthor	Dikbayır, Doğa
dc.contributor.kuauthor	Erten, Didem Unat
dc.contributor.kuauthor	Kesen, İlker
dc.contributor.kuauthor	Yüret, Deniz
dc.contributor.schoolcollegeinstitute	College of Engineering
dc.contributor.schoolcollegeinstitute	GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
dc.date.accessioned	2024-11-09T22:45:40Z
dc.date.issued	2018
dc.description.abstract	Reduction and broadcast operations are commonly used in machine learning algorithms for different purposes. They widely appear in the calculation of the gradient values of a loss function, which are one of the core structures of neural networks. Both operations are implemented naively in many libraries usually for scalar reduction or broadcast; however, to our knowledge, there are no optimized multidimensional implementations available. This fact limits the performance of machine learning models requiring these operations to be performed on tensors. In this work, we address the problem and propose two new strategies that extend the existing implementations to perform on tensors. We introduce formal definitions of both operations using tensor notations, investigate their mathematical properties, and exploit these properties to provide an efficient solution for each. We implement our parallel strategies and test them on a CUDA enabled Tesla K40m GPU accelerator. Our performant implementations achieve up to 75% of the peak device memory bandwidth on different tensor sizes and dimensions. Significant speedups against the implementations available in the Knet Deep Learning framework are also achieved for both operations.
dc.description.indexedby	WOS
dc.description.indexedby	Scopus
dc.description.issue	21
dc.description.openaccess	NO
dc.description.sponsoredbyTubitakEu	N/A
dc.description.volume	30
dc.identifier.doi	10.1002/cpe.4691
dc.identifier.eissn	1532-0634
dc.identifier.issn	1532-0626
dc.identifier.scopus	2-s2.0-85047664332
dc.identifier.uri	https://doi.org/10.1002/cpe.4691
dc.identifier.uri	https://hdl.handle.net/20.500.14288/6138
dc.identifier.wos	447267900004
dc.keywords	Broadcast
dc.keywords	CUDA
dc.keywords	GPU
dc.keywords	Machine learning
dc.keywords	Multidimensional arrays
dc.keywords	Reduction
dc.keywords	Tensor
dc.language.iso	eng
dc.publisher	Wiley
dc.relation.ispartof	Concurrency and Computation-Practice and Experience
dc.subject	Computer science
dc.subject	Software engineering
dc.subject	Computer science
dc.subject	Theory methods
dc.title	Fast multidimensional reduction and broadcast operations on GPU for machine learning
dc.type	Journal Article
dspace.entity.type	Publication
local.contributor.kuauthor	Dikbayır, Doğa
local.contributor.kuauthor	Çoban, Enis Berk
local.contributor.kuauthor	Kesen, İlker
local.contributor.kuauthor	Yüret, Deniz
local.contributor.kuauthor	Erten, Didem Unat
local.publication.orgunit1	GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
local.publication.orgunit1	College of Engineering
local.publication.orgunit2	Department of Computer Engineering
local.publication.orgunit2	Graduate School of Sciences and Engineering
relation.isOrgUnitOfPublication	89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication	3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isOrgUnitOfPublication.latestForDiscovery	89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isParentOrgUnitOfPublication	8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublication	434c9663-2b11-4e66-9399-c863e2ebae43
relation.isParentOrgUnitOfPublication.latestForDiscovery	8e756b23-2d4a-4ce8-b1b3-62c794a8c164

Collections

Publications without Fulltext

Publication: Fast multidimensional reduction and broadcast operations on GPU for machine learning

Files

Collections

Publication:
Fast multidimensional reduction and broadcast operations on GPU for machine learning