Balanced and elastic end-to-end training of dynamic LLMs

Publication:
Balanced and elastic end-to-end training of dynamic LLMs

dc.conference.date	2025-11-16 through 2025-11-21
dc.conference.location	St. Louis
dc.contributor.coauthor	Wahib, Mohamed
dc.contributor.department	Department of Computer Engineering
dc.contributor.department	GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
dc.contributor.kuauthor	Soytürk, Muhammet Abdullah
dc.contributor.kuauthor	Erten, Didem Unat
dc.contributor.schoolcollegeinstitute	College of Engineering
dc.contributor.schoolcollegeinstitute	GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
dc.date.accessioned	2026-01-16T08:45:37Z
dc.date.available	2026-01-16
dc.date.issued	2025
dc.description.abstract	To reduce the computational and memory overhead of Large Language Models, various approaches have been proposed. These include a) Mixture of Experts (MoEs), where token routing affects compute balance; b) gradual pruning of model parameters; c) dynamically freezing layers; d) dynamic sparse attention mechanisms; e) early exit of tokens as they pass through model layers; and f) Mixture of Depths (MoDs), where tokens bypass certain blocks. While these approaches are effective in reducing overall computation, they often introduce significant workload imbalance across workers. In many cases, this imbalance is severe enough to render the techniques impractical for large-scale distributed training, limiting their applicability to toy models due to poor efficiency. We propose an autonomous dynamic load balancing solution, DynMo, which provably achieves maximum reduction in workload imbalance and adaptively equalizes compute loads across workers in pipeline-parallel training. In addition, DynMo dynamically consolidates computation onto fewer workers without sacrificing training throughput, allowing idle workers to be released back to the job manager. DynMo supports both single-node multi-GPU systems and multi-node GPU clusters, and can be used in practical deployment. Compared to static distributed training solutions such as Megatron-LM and DeepSpeed, DynMo accelerates the end-to-end training of dynamic GPT models by up to 1.23x for MoEs, 3.18x for parameter pruning, 2.23x for layer freezing, 4.02x for sparse attention, 4.52x for early exit, and 1.17x for MoDs. © 2025 Copyright held by the owner/author(s).
dc.description.fulltext	Yes
dc.description.harvestedfrom	Manual
dc.description.indexedby	Scopus
dc.description.openaccess	Gold OA
dc.description.publisherscope	International
dc.description.readpublish	N/A
dc.description.sponsoredbyTubitakEu	N/A
dc.identifier.doi	10.1145/3712285.3759775
dc.identifier.embargo	No
dc.identifier.endpage	1367
dc.identifier.isbn	9798400714665
dc.identifier.quartile	N/A
dc.identifier.scopus	2-s2.0-105023989396
dc.identifier.startpage	1351
dc.identifier.uri	https://doi.org/10.1145/3712285.3759775
dc.identifier.uri	https://hdl.handle.net/20.500.14288/32029
dc.keywords	Large language models
dc.keywords	Load balancing
dc.keywords	Pipeline parallelism
dc.language.iso	eng
dc.publisher	Association for Computing Machinery
dc.relation.affiliation	Koç University
dc.relation.collection	Koç University Institutional Repository
dc.relation.ispartof	2025 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2025
dc.relation.openaccess	Yes
dc.rights	CC BY (Attribution)
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Computer Science
dc.title	Balanced and elastic end-to-end training of dynamic LLMs
dc.type	Conference Proceeding
dspace.entity.type	Publication
person.familyName	Soytürk
person.familyName	Erten
person.givenName	Muhammet Abdullah
person.givenName	Didem Unat
relation.isOrgUnitOfPublication	89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication	434c9663-2b11-4e66-9399-c863e2ebae43
relation.isOrgUnitOfPublication.latestForDiscovery	89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isParentOrgUnitOfPublication	8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublication	434c9663-2b11-4e66-9399-c863e2ebae43
relation.isParentOrgUnitOfPublication.latestForDiscovery	8e756b23-2d4a-4ce8-b1b3-62c794a8c164

Collections

Publications without Fulltext

Publication: Balanced and elastic end-to-end training of dynamic LLMs

Files

Collections

Publication:
Balanced and elastic end-to-end training of dynamic LLMs