HomeModelsBlog & PublicationJoin Us
EN
中文
HomeModelsBlog & PublicationJoin Us

2025-05-09

Understanding Stragglers in Large Model Training Using What-if Analysis

Download PDF
PreviousNext

ABSTRACT

Large language model (LLM) training is one of the most demanding distributed computations today, often requiring thousands of GPUs with frequent synchronization across machines. Such a workload pattern makes it susceptible to stragglers, where the training can be stalled by few slow workers. At ByteDance we find stragglers are not trivially always caused by hardware failures, but can arise from multiple complex factors. This work aims to present a comprehensive study on the straggler issues in LLM training, using a five-month trace collected from our ByteDance LLM training cluster. The core methodology is what-if analysis that simulates the scenario without any stragglers and contrasts with the actual case. We use this method to study the following questions: (1) how often do stragglers affect training jobs, and what effect do they have on job performance; (2) do stragglers exhibit temporal or spatial patterns; and (3) what are the potential root causes for stragglers?

AUTHORS

Jinkun Lin, Ziheng Jiang, Zuquan Song, Sida Zhao, Menghan Yu, Zhanghan Wang, Chenyuan Wang, Zuocheng Shi, Xiang Shi, Wei Jia, Zherui Liu, Shuguang Wang, Haibin Lin, Xin Liu, Aurojit Panda, Jinyang Li

VENUE

OSDI 2025

Models
Seed2.0Seedance 2.0Seedream 5.0 LiteSeeduplexSeed GR-RL
Teams
LLMInfrastructuresVisionSpeechMultimodal Interaction & World ModelAI for ScienceRoboticsResponsible AI
Learn More
BlogSeed EdgeSeed Campus Recruitment
Models
Seed2.0
Seedance 2.0
Seedream 5.0 Lite
Seeduplex
Seed GR-RL
Teams
LLM
Infrastructures
Vision
Speech
Multimodal Interaction & World Model
AI for Science
Robotics
Responsible AI
Learn More
Blog
Seed Edge
Seed Campus Recruitment
Advancing the frontier of intelligence, in service of humanity
Join ByteDance Seed
Copyright © 2026 Bytedance Seed
Disclaimer
Contact us : seed.feedback@bytedance.com
Join ByteDance Seed
Copyright © 2026 Bytedance Seed
Disclaimer