LLM

Seed-LLM 团队致力于激进的探索下一代大模型，并且研究大模型研发的基础问题，包括但不限于模型的 Pretrain、Posttrain、推理、记忆、学习、可解释性等方向。我们探索前沿技术，进行端到端落地，不断摸索大模型跟应用的结合点和解放应用的可能性。

课题方向

Horizon

Long CoT 模型的极限

探索长推理模型的上限，从 Inference-Time Scaling 和 Model Scaling 的角度不断扩展，目标是解决人类还不能解决的复杂问题

O-model Architecture

推理维度的 Scaling Law 是通往终极智能的重要路径，我们的目标是开发终生学习的智能系统，让模型具备线性复杂度的推理能力

Memory

建立流式 Memory，管理无限长的上下文，做到真正的在线学习，比如：阅读算法导论，学会写代码，或者阅读语法书，学会新语言

推理规划 Agent

致力于解决 Agent 领域核心基础型问题，构建超级智能系统（Super Intelligence System），加速自然科学、经济生产和日常生活的跨越式发展，全面提升社会效率与人类生活品质

Pretrain

预训练和持续训练的新范式

探索大规模合成数据，冲破自然语料的增速瓶颈；研究 AI 的自主数据迭代，让预训练和后训练薪火相传

语言模型数据压缩的科学

人类文明和世界知识的压缩重任，靠一次次的数据挖掘和发现，靠一遍遍的假设和实验

效果和效率的极限

一手触摸模型智能的边界、仰望星空，一手探测参数规模的下限、脚踏实地，模型的效果和效率两手都要硬

训练动态和机制研究

让 Scaling Law 扩展到模型优化的每个动态，让解释机制捕获神经网络的回路负载，用物理学扫除炼丹炉上笼罩的阴霾

超长上下文能力

解锁模型的长上下文能力，不断突破理解和生成的 Token 限制长度，让模型一次就能读万卷书、行万里路

Posttrain

Large Scale Reinforcement Learning

解决超大规模 RL Scaling 的问题，提升模型的智力，对齐人类偏好

Reward Model System

综合 Model、Verifier、Tool 和 Agent，给数据筛选、合成和 RL 训练提供准确泛化的信号

Superb Reasoning 及通用泛化

让 Reasoning 进一步突破边界，同时在更多领域达到人类专家水平

Long Horizon Task / Agent

解决 Long Horizon Task / Agent 长距离、多轮建模，让模型能够真正解决人类世界的复杂问题

下一代 RM / RL 算法

研究能够突破当前瓶颈的新 RM / RL 算法

数据质量优化

持续优化 Posttraining 训练数据，进一步提升模型的能力上限

Code

代码预训练

通过原始数据的筛选、基于 Commit/issue/pr 数据合成构造等方法，提升豆包模型的代码基础能力

基于运行反馈的数据合成

代码数据的特点是可以通过“运行”的方式，使用算力大规模换取互联网数据之外的监督信号，通过规模化此类方法，为下一代大模型增强代码、逻辑能力

Code Agent 数据自动构建

自动化构造正确且多样的代码竞赛/工程题目，自动化工程环境配置，为 Code Agent 的大规模强化学习提供数据保障

Learning to Learn

面向模型自进化的研究，使得模型学会自己获取、处理训练数据提升自身

Model

Model Reliable

研究模型在 Scaling up 的过程中，能够稳定高效的训练，分析并解决模型 Scaling 过程的参数优化稳定和效率问题，使得模型稳定训练并保持良好的 Scaling Law

Long Context

研究 Long Context 并结合 Deep Research，Reasoning 优化训练及推理的性能及效率优化问题

Model Structure

研究基座模型的结构，如 MoE、模型的残差、Normalization、Tokenization 等算法问题，使 LLM 模型达到更高的效率，研究模型结构对大模型的性能上限的影响

Efficient

涵盖多个方面，包括模型的计算效率（能够在有限的计算资源和时间内完成训练和推理) 、存储效率（占用较少的显存空间）以及数据利用效率（能够从有限的数据中学习到更多的知识）等，如 Quantization，结合工程优化 MFU，Pruning 等

Horizon

Long CoT 模型的极限

探索长推理模型的上限，从 Inference-Time Scaling 和 Model Scaling 的角度不断扩展，目标是解决人类还不能解决的复杂问题

O-model Architecture

推理维度的 Scaling Law 是通往终极智能的重要路径，我们的目标是开发终生学习的智能系统，让模型具备线性复杂度的推理能力

Memory

建立流式 Memory，管理无限长的上下文，做到真正的在线学习，比如：阅读算法导论，学会写代码，或者阅读语法书，学会新语言

推理规划 Agent

Pretrain

预训练和持续训练的新范式

探索大规模合成数据，冲破自然语料的增速瓶颈；研究 AI 的自主数据迭代，让预训练和后训练薪火相传

语言模型数据压缩的科学

人类文明和世界知识的压缩重任，靠一次次的数据挖掘和发现，靠一遍遍的假设和实验

效果和效率的极限

一手触摸模型智能的边界、仰望星空，一手探测参数规模的下限、脚踏实地，模型的效果和效率两手都要硬

训练动态和机制研究

让 Scaling Law 扩展到模型优化的每个动态，让解释机制捕获神经网络的回路负载，用物理学扫除炼丹炉上笼罩的阴霾

超长上下文能力

解锁模型的长上下文能力，不断突破理解和生成的 Token 限制长度，让模型一次就能读万卷书、行万里路

Posttrain

Large Scale Reinforcement Learning

解决超大规模 RL Scaling 的问题，提升模型的智力，对齐人类偏好

Reward Model System

综合 Model、Verifier、Tool 和 Agent，给数据筛选、合成和 RL 训练提供准确泛化的信号

Superb Reasoning 及通用泛化

让 Reasoning 进一步突破边界，同时在更多领域达到人类专家水平

Long Horizon Task / Agent

解决 Long Horizon Task / Agent 长距离、多轮建模，让模型能够真正解决人类世界的复杂问题

下一代 RM / RL 算法

研究能够突破当前瓶颈的新 RM / RL 算法

数据质量优化

持续优化 Posttraining 训练数据，进一步提升模型的能力上限

Code

代码预训练

通过原始数据的筛选、基于 Commit/issue/pr 数据合成构造等方法，提升豆包模型的代码基础能力

基于运行反馈的数据合成

Code Agent 数据自动构建

自动化构造正确且多样的代码竞赛/工程题目，自动化工程环境配置，为 Code Agent 的大规模强化学习提供数据保障

Learning to Learn

面向模型自进化的研究，使得模型学会自己获取、处理训练数据提升自身

Model

Model Reliable

Long Context

研究 Long Context 并结合 Deep Research，Reasoning 优化训练及推理的性能及效率优化问题

Model Structure

研究基座模型的结构，如 MoE、模型的残差、Normalization、Tokenization 等算法问题，使 LLM 模型达到更高的效率，研究模型结构对大模型的性能上限的影响

Efficient

精选论文

2025.05.21

MMaDA: Multimodal Large Diffusion Language Models

We introduce MMaDA, a novel class of multimodal diffusion foundation models designed to achieve superior performance across diverse domains such as textual reasoning, multimodal understanding, and text-to-image generation. The approach is distinguished by three key innovations: (i) MMaDA adopts a unified diffusion architecture with a shared probabilistic formulation and a modality-agnostic design, eliminating the need for modality-specific components. This architecture ensures seamless integration and processing across different data types. (ii) We implement a mixed long chain-of-thought (CoT) fine-tuning strategy that curates a unified CoT format across modalities. By aligning reasoning processes between textual and visual domains, this strategy facilitates cold-start training for the final reinforcement learning (RL) stage, thereby enhancing the model's ability to handle complex tasks from the outset. (iii) We propose UniGRPO, a unified policy-gradient-based RL algorithm specifically tailored for diffusion foundation models. Utilizing diversified reward modeling, UniGRPO unifies post-training across both reasoning and generation tasks, ensuring consistent performance improvements. Experimental results demonstrate that MMaDA-8B exhibits strong generalization capabilities as a unified multimodal foundation model. It surpasses powerful models like LLaMA-3-7B and Qwen2-7B in textual reasoning, outperforms Show-o and SEED-X in multimodal understanding, and excels over SDXL and Janus in text-to-image generation. These achievements highlight MMaDA's effectiveness in bridging the gap between pretraining and post-training within unified diffusion architectures, providing a comprehensive framework for future research and development. We open-source our code and trained models at: https://github.com/Gen-Verse/MMaDA

Ling Yang, Ye Tian, Bowen Li, Xinchen Zhang, Ke Shen, Yunhai Tong, Mengdi Wang

Computer Vision

2025.05.17

Model Merging in Pre-training of Large Language Models

Model merging has emerged as a promising technique for enhancing large language models, though its application in large-scale pre-training remains relatively unexplored. In this paper, we present a comprehensive investigation of model merging techniques during the pre-training process. Through extensive experiments with both dense and Mixture-of-Experts (MoE) architectures ranging from millions to over 100 billion parameters, we demonstrate that merging checkpoints trained with constant learning rates not only achieves significant performance improvements but also enables accurate prediction of annealing behavior. These improvements lead to both more efficient model development and significantly lower training costs. Our detailed ablation studies on merging strategies and hyperparameters provide new insights into the underlying mechanisms while uncovering novel applications. Through comprehensive experimental analysis, we offer the open-source community practical pre-training guidelines for effective model merging.

Yunshui Li, Yiyuan Ma, Shen Yan, Chaoyi Zhang, Jing Liu, Jianqiao Lu, Ziwen Xu, Mengzhao Chen, Minrui Wang, Shiyi Zhan, Jin Ma, Xunhao Lai, Deyi Liu, Yao Luo, Xingyan Bin, Hongbin Ren, Mingji Han, Wenhao Hao, Bairen Yi, LingJun Liu, Bole Ma, Xiaoying Jia, Xun Zhou, Siyuan Qiao, Liang Xiang, Yonghui Wu

LLM

2025.04.10

Seed-Thinking-v1.5: Advancing Superb Reasoning Models with Reinforcement Learning

We introduce Seed-Thinking-v1.5, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks. Seed-Thinking-v1.5 achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains. For instance, it surpasses DeepSeek R1 by 8% in win rate on non-reasoning tasks, indicating its broader applicability. Compared to other state-of-the-art reasoning models, Seed-Thinking-v1.5 is a Mixture-of-Experts (MoE) model with a relatively small size, featuring 20B activated and 200B total parameters. As part of our effort to assess generalized reasoning, we develop two internal benchmarks, BeyondAIME and Codeforces, both of which will be publicly released to support future research.

Jiaze Chen, TianTian Fan, Xin Liu, Lingjun Liu, Zhiqi Lin, Mingxuan Wang, Chengyi Wang, Xiangpeng Wei, Wenyuan Xu,Yufeng Yuan, Yu Yue, Lin Yan, Qiying Yu, Xiaochen Zuo, Chi Zhang

LLM

查看更多论文

技术能力展示

Seed1.5（Doubao-1.5-pro)

新一代主力模型，性能全面升级，在知识、代码、推理等方面表现领先

Seed1.6

融合多模态能力，“自适应思考”平衡模型效果与推理性能

热招岗位

大语言模型算法工程师-Seed

北京/上海/深圳/杭州

社招

立即投递

大语言模型算法研究专家-Seed

北京/上海/深圳/杭州

社招

立即投递

大语言模型推理算法研究专家-Seed

北京/上海/深圳/杭州

社招

立即投递

大语言模型算法工程师-Top Seed

北京/上海/深圳/杭州

校招

立即投递

大语言模型算法实习生-Seed

北京/上海/深圳/杭州

实习

立即投递

查看更多岗位