首页 Top Seed Seed Edge 研究成果团队动态加入我们

EN

中文

首页 Top Seed Seed Edge 研究成果团队动态加入我们

Infrastructures

Seed-Infrastructures 团队负责大模型的分布式训练、强化学习框架、高性能推理、异构硬件编译器等工作

课题方向

超大规模分布式训练

研究超大规模训练集群，如何让训练的稳定性和 MFU 提升，跨集群、低精度、容错及弹性训练

Large-scale

Stability

Large-scale

Stability

强化学习系统

研究端到端的大模型强化系统，在动态负载、复杂 Agent/环境交互、异构资源、多模态场景下设计下一代系统

Reinforcement learning

Agent

Optimization

Reinforcement learning

Agent

推理并行方案

研究如何解决推理的计算和访存瓶颈，多机推理，异构硬件的并行推理方案和调度优化

Inference

Parallel

Inference

Parallel

下一代模型与硬件体系联合优化

结合下一代硬件体系和下一代生成理解模型架构，研究更先进的模型结构、训练模式、推理模式

Systems-algorithm co-design

Model architecture

Systems-algorithm co-design

Model architecture

异构硬件编译优化

研究新硬件体系结构下高性能算子的编译优化、计算通讯联合优化

Heterogeneous systems

Compiler

Heterogeneous systems

Compiler

精选论文

Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference

We present Seed Diffusion Preview, a large-scale language model based on discrete-state diffusion, offering remarkably fast inference speed. Thanks to non-sequential, parallel generation, discrete diffusion models provide a notable speedup to mitigate the inherent latency of token-by-token decoding, as demonstrated recently (e.g., Mercury Coder, Gemini Diffusion). Seed Diffusion Preview achieves an inference speed of 2,146 token/s over H20 GPUs while maintaining competitive performance across a sweep of standard code evaluation benchmarks, significantly faster than contemporary Mercury and Gemini Diffusion, establishing new state of the art on the speed-quality Pareto frontier for code models.

Yuxuan Song, Zheng Zhang, Cheng Luo, Pengyang Gao, Fan Xia, Hao Luo, Zheng Li, Yuehang Yang, Hongli Yu, Xingwei Qu, Yuwei Fu, Jing Su, Ge Zhang, Wenhao Huang, Mingxuan Wang, Lin Yan, Xiaoying Jia, Jingjing Liu, Wei-Ying Ma, Ya-Qin Zhang, Yonghui Wu, Hao Zhou

Computation and Language

Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference

Yuxuan Song, Zheng Zhang, Cheng Luo, Pengyang Gao, Fan Xia, Hao Luo, Zheng Li, Yuehang Yang, Hongli Yu, Xingwei Qu, Yuwei Fu, Jing Su, Ge Zhang, Wenhao Huang, Mingxuan Wang, Lin Yan, Xiaoying Jia, Jingjing Liu, Wei-Ying Ma, Ya-Qin Zhang, Yonghui Wu, Hao Zhou

Computation and Language

MMaDA: Multimodal Large Diffusion Language Models

We introduce MMaDA, a novel class of multimodal diffusion foundation models designed to achieve superior performance across diverse domains such as textual reasoning, multimodal understanding, and text-to-image generation. The approach is distinguished by three key innovations: (i) MMaDA adopts a unified diffusion architecture with a shared probabilistic formulation and a modality-agnostic design, eliminating the need for modality-specific components. This architecture ensures seamless integration and processing across different data types. (ii) We implement a mixed long chain-of-thought (CoT) fine-tuning strategy that curates a unified CoT format across modalities. By aligning reasoning processes between textual and visual domains, this strategy facilitates cold-start training for the final reinforcement learning (RL) stage, thereby enhancing the model's ability to handle complex tasks from the outset. (iii) We propose UniGRPO, a unified policy-gradient-based RL algorithm specifically tailored for diffusion foundation models. Utilizing diversified reward modeling, UniGRPO unifies post-training across both reasoning and generation tasks, ensuring consistent performance improvements. Experimental results demonstrate that MMaDA-8B exhibits strong generalization capabilities as a unified multimodal foundation model. It surpasses powerful models like LLaMA-3-7B and Qwen2-7B in textual reasoning, outperforms Show-o and SEED-X in multimodal understanding, and excels over SDXL and Janus in text-to-image generation. These achievements highlight MMaDA's effectiveness in bridging the gap between pretraining and post-training within unified diffusion architectures, providing a comprehensive framework for future research and development. We open-source our code and trained models at: https://github.com/Gen-Verse/MMaDA

Ling Yang, Ye Tian, Bowen Li, Xinchen Zhang, Ke Shen, Yunhai Tong, Mengdi Wang

Computer Vision

MMaDA: Multimodal Large Diffusion Language Models

Ling Yang, Ye Tian, Bowen Li, Xinchen Zhang, Ke Shen, Yunhai Tong, Mengdi Wang

Computer Vision

Model Merging in Pre-training of Large Language Models

Model merging has emerged as a promising technique for enhancing large language models, though its application in large-scale pre-training remains relatively unexplored. In this paper, we present a comprehensive investigation of model merging techniques during the pre-training process. Through extensive experiments with both dense and Mixture-of-Experts (MoE) architectures ranging from millions to over 100 billion parameters, we demonstrate that merging checkpoints trained with constant learning rates not only achieves significant performance improvements but also enables accurate prediction of annealing behavior. These improvements lead to both more efficient model development and significantly lower training costs. Our detailed ablation studies on merging strategies and hyperparameters provide new insights into the underlying mechanisms while uncovering novel applications. Through comprehensive experimental analysis, we offer the open-source community practical pre-training guidelines for effective model merging.

Yunshui Li, Yiyuan Ma, Shen Yan, Chaoyi Zhang, Jing Liu, Jianqiao Lu, Ziwen Xu, Mengzhao Chen, Minrui Wang, Shiyi Zhan, Jin Ma, Xunhao Lai, Deyi Liu, Yao Luo, Xingyan Bin, Hongbin Ren, Mingji Han, Wenhao Hao, Bairen Yi, LingJun Liu, Bole Ma, Xiaoying Jia, Xun Zhou, Siyuan Qiao, Liang Xiang, Yonghui Wu

Model Merging in Pre-training of Large Language Models

Yunshui Li, Yiyuan Ma, Shen Yan, Chaoyi Zhang, Jing Liu, Jianqiao Lu, Ziwen Xu, Mengzhao Chen, Minrui Wang, Shiyi Zhan, Jin Ma, Xunhao Lai, Deyi Liu, Yao Luo, Xingyan Bin, Hongbin Ren, Mingji Han, Wenhao Hao, Bairen Yi, LingJun Liu, Bole Ma, Xiaoying Jia, Xun Zhou, Siyuan Qiao, Liang Xiang, Yonghui Wu

查看更多论文

热招岗位

机器学习训练框架研发工程师/专家-Seed

北京/上海/深圳/杭州

社招

机器学习系统推理引擎资深工程师/专家-Seed

北京/上海/杭州

社招

机器学习系统调度工程师/专家-Seed

北京/上海/杭州

社招

大模型推理存储系统工程师/专家-Seed

北京/上海/深圳/杭州

社招

AI异构计算优化工程师/专家-Seed

北京/上海/深圳/杭州

社招

机器学习系统研发实习生-Seed

北京/上海/深圳/杭州

实习

机器学习训练框架研发工程师/专家-Seed

北京/上海/深圳/杭州

机器学习系统推理引擎资深工程师/专家-Seed

北京/上海/杭州

机器学习系统调度工程师/专家-Seed

北京/上海/杭州

大模型推理存储系统工程师/专家-Seed

北京/上海/深圳/杭州

AI异构计算优化工程师/专家-Seed

北京/上海/深圳/杭州

机器学习系统研发实习生-Seed

北京/上海/深圳/杭州

查看更多岗位

和优秀的人，做有挑战的事

欢迎加入字节跳动 Seed

用户协议隐私政策

关注字节跳动 Seed 了解最新技术进展、研究成果和招聘信息

Copyright © 2025 Bytedance Seed

和优秀的人，做有挑战的事

欢迎加入字节跳动 Seed

用户协议隐私政策

Copyright © 2025 Bytedance Seed