Publication:
A computational-graph partitioning method for training memory-constrained DNNs

dc.contributor.coauthorWahib, Mohamed
dc.contributor.coauthorDikbayir, Doga
dc.contributor.coauthorBelviranli, Mehmet Esat
dc.contributor.departmentN/A
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.kuauthorQararyah, Fareed Mohammad
dc.contributor.kuauthorErten, Didem Unat
dc.contributor.kuprofilePhD Student
dc.contributor.kuprofileFaculty Member
dc.contributor.otherDepartment of Computer Engineering
dc.contributor.schoolcollegeinstituteGraduate School of Sciences and Engineering
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.yokidN/A
dc.contributor.yokid219274
dc.date.accessioned2024-11-09T22:56:38Z
dc.date.issued2021
dc.description.abstractMany state-of-the-art Deep Neural Networks (DNNs) have substantial memory requirements. Limited device memory becomes a bottleneck when training those models. We propose ParDNN, an automatic, generic, and non-intrusive partitioning strategy for DNNs that are represented as computational graphs. ParDNN decides a placement of DNN's underlying computational graph operations across multiple devices so that the devices' memory constraints are met and the training time is minimized. ParDNN is completely independent of the deep learning aspects of a DNN. It requires no modification neither at the model nor at the systems level implementation of its operation kernels. ParDNN partitions DNNs having billions of parameters and hundreds of thousands of operations in seconds to few minutes. Our experiments with TensorFlow on 16 GPUs demonstrate efficient training of 5 very large models while achieving superlinear scaling for both the batch size and training throughput. ParDNN either outperforms or qualitatively improves upon the related work.
dc.description.indexedbyWoS
dc.description.indexedbyScopus
dc.description.openaccessYES
dc.description.publisherscopeInternational
dc.description.sponsorshipTurkish Science and Technology Research Centre [118E801]
dc.description.sponsorshipJST-CREST [JPMJCR19F5]
dc.description.sponsorshipResearch Council of Norway [270053] Authors from Koc University are supported by the Turkish Science and Technology Research Centre Grant No: 118E801. This work was partially supported by JST-CREST under Grant Number JPMJCR19F5. The research presented in this paper has benefited from the Experimental Infrastructure for Exploration of Exascale Computing (eX3), which is financially supported by the Research Council of Norway under contract 270053.
dc.description.volume104
dc.identifier.doi10.1016/j.parco.2021.102792
dc.identifier.eissn1872-7336
dc.identifier.issn0167-8191
dc.identifier.quartileQ2
dc.identifier.scopus2-s2.0-85105319626
dc.identifier.urihttp://dx.doi.org/10.1016/j.parco.2021.102792
dc.identifier.urihttps://hdl.handle.net/20.500.14288/7414
dc.identifier.wos654719400005
dc.keywordsDnn
dc.keywordsGraph partitioning
dc.keywordsModel parallelism
dc.languageEnglish
dc.publisherElsevier
dc.sourceParallel Computing
dc.subjectComputer science
dc.titleA computational-graph partitioning method for training memory-constrained DNNs
dc.typeJournal Article
dspace.entity.typePublication
local.contributor.authorid0000-0002-3955-2836
local.contributor.authorid0000-0002-2351-0770
local.contributor.kuauthorQararyah, Fareed Mohammad
local.contributor.kuauthorErten, Didem Unat
relation.isOrgUnitOfPublication89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication.latestForDiscovery89352e43-bf09-4ef4-82f6-6f9d0174ebae

Files