Publication: A computational-graph partitioning method for training memory-constrained DNNs
dc.contributor.coauthor | Wahib, Mohamed | |
dc.contributor.coauthor | Dikbayir, Doga | |
dc.contributor.coauthor | Belviranli, Mehmet Esat | |
dc.contributor.department | N/A | |
dc.contributor.department | Department of Computer Engineering | |
dc.contributor.kuauthor | Qararyah, Fareed Mohammad | |
dc.contributor.kuauthor | Erten, Didem Unat | |
dc.contributor.kuprofile | PhD Student | |
dc.contributor.kuprofile | Faculty Member | |
dc.contributor.other | Department of Computer Engineering | |
dc.contributor.schoolcollegeinstitute | Graduate School of Sciences and Engineering | |
dc.contributor.schoolcollegeinstitute | College of Engineering | |
dc.contributor.yokid | N/A | |
dc.contributor.yokid | 219274 | |
dc.date.accessioned | 2024-11-09T22:56:38Z | |
dc.date.issued | 2021 | |
dc.description.abstract | Many state-of-the-art Deep Neural Networks (DNNs) have substantial memory requirements. Limited device memory becomes a bottleneck when training those models. We propose ParDNN, an automatic, generic, and non-intrusive partitioning strategy for DNNs that are represented as computational graphs. ParDNN decides a placement of DNN's underlying computational graph operations across multiple devices so that the devices' memory constraints are met and the training time is minimized. ParDNN is completely independent of the deep learning aspects of a DNN. It requires no modification neither at the model nor at the systems level implementation of its operation kernels. ParDNN partitions DNNs having billions of parameters and hundreds of thousands of operations in seconds to few minutes. Our experiments with TensorFlow on 16 GPUs demonstrate efficient training of 5 very large models while achieving superlinear scaling for both the batch size and training throughput. ParDNN either outperforms or qualitatively improves upon the related work. | |
dc.description.indexedby | WoS | |
dc.description.indexedby | Scopus | |
dc.description.openaccess | YES | |
dc.description.publisherscope | International | |
dc.description.sponsorship | Turkish Science and Technology Research Centre [118E801] | |
dc.description.sponsorship | JST-CREST [JPMJCR19F5] | |
dc.description.sponsorship | Research Council of Norway [270053] Authors from Koc University are supported by the Turkish Science and Technology Research Centre Grant No: 118E801. This work was partially supported by JST-CREST under Grant Number JPMJCR19F5. The research presented in this paper has benefited from the Experimental Infrastructure for Exploration of Exascale Computing (eX3), which is financially supported by the Research Council of Norway under contract 270053. | |
dc.description.volume | 104 | |
dc.identifier.doi | 10.1016/j.parco.2021.102792 | |
dc.identifier.eissn | 1872-7336 | |
dc.identifier.issn | 0167-8191 | |
dc.identifier.quartile | Q2 | |
dc.identifier.scopus | 2-s2.0-85105319626 | |
dc.identifier.uri | http://dx.doi.org/10.1016/j.parco.2021.102792 | |
dc.identifier.uri | https://hdl.handle.net/20.500.14288/7414 | |
dc.identifier.wos | 654719400005 | |
dc.keywords | Dnn | |
dc.keywords | Graph partitioning | |
dc.keywords | Model parallelism | |
dc.language | English | |
dc.publisher | Elsevier | |
dc.source | Parallel Computing | |
dc.subject | Computer science | |
dc.title | A computational-graph partitioning method for training memory-constrained DNNs | |
dc.type | Journal Article | |
dspace.entity.type | Publication | |
local.contributor.authorid | 0000-0002-3955-2836 | |
local.contributor.authorid | 0000-0002-2351-0770 | |
local.contributor.kuauthor | Qararyah, Fareed Mohammad | |
local.contributor.kuauthor | Erten, Didem Unat | |
relation.isOrgUnitOfPublication | 89352e43-bf09-4ef4-82f6-6f9d0174ebae | |
relation.isOrgUnitOfPublication.latestForDiscovery | 89352e43-bf09-4ef4-82f6-6f9d0174ebae |