LLM
The Seed Large Language Model (LLM) team is dedicated to aggressively advancing the next generation of LLMs, tackling fundamental challenges in LLM development head-on. Our areas of focus include model pretraining, posttraining, inference, memory capabilities, learning, interpretability and other related directions. We dive deep into the latest technologies and create comprehensive solutions from concept to completion. In our endeavor to adopt LLMs in real-life scenarios, we persistently seek out methods to enhance applications through technological innovation.
Research topics
Horizon
Limits of the Long CoT Model
Explore the limits of long reasoning models, continuously expanding from the perspective of inference-time scaling and model scaling, with the objective of solving complex problems that humans cannot yet address.
O-model Architecture
The scaling law of inference dimension is a key to achieving ultimate intelligence. We aim to develop a lifelong learning intelligent system, enabling models to possess reasoning capability with linear complexity.
Memory
Establish a streaming memory mechanism that can manage the context of unlimited length and truly achieve online learning, such as learning to code by reading algorithms and learning a new language by reading a grammar book.
Agent's Reasoning and Planning Capabilities
Committed to solving the core and fundamental problems in the field of agent, building a super intelligence system to accelerate the leapfrog development in natural sciences, economic production, and daily life, and comprehensively enhancing societal efficiency and the quality of human life.
Pretrain
Next-Generation Pretraining Paradigm
Exploring large-scale synthetic data to overcome the growth constraints of real-world data. Research AI's autonomous data iteration to ensure a seamless transition between pre-training and post-training.
Data Compression for Language Models
The compression of human civilization and world knowledge relies on continuous data mining and repeated hypotheses and experiments.
Push the Limit of Model Performance and Efficiency
Push the boundaries of model intelligence with visionary ambition, while exploring the lower limit of parameter scale in a pragmatic way. The model's performance and efficiency must both be excellent.
Research on Training Dynamics and Mechanisms
Let scaling laws extend to every aspect of model optimization and enable explainability mechanisms to inspect neural network loads. Reveal the principles of model training from a physics perspective.
Enhanced Long Context Capabilities
Unlock the model's capacity to manage long contexts and increase token length limits for better content understanding and generation. The model should be able to grasp and generate extensive content within a single instance.
Posttrain
Large Scale Reinforcement Learning
Address the issue of large-scale RL scaling, enhance model capabilities, and align with human preferences.
Reward Model System
Integrate model, verifier, tool, and agent to provide accurate and generalized signals for data selection, synthesis, and RL training.
Superb Reasoning and Generalization
Further push the boundaries of reasoning, achieving human expert level across more domains.
Long Horizon Task / Agent
Address long-distance, multi-turn modeling in long horizon task/agent, enabling models to truly solve complex problems in the human world.
Next Generation of RM/RL Algorithms
Explore new RM/RL algorithms that can overcome the current limitations.
Data Quality Optimization
Continuously optimize post-training data to further enhance the model's capability limits.
Code
Code Pre-training
Enhance the foundational coding abilities of the Doubao model through methods such as raw data filtering and data synthesis based on commit/issue/pr data.
Data Synthesis Based on Execution Feedback
The characteristic of code data is that it can be "run" to leverage computational power for supervision signals beyond internet data. Scaling up these methods enhances the coding and logic capabilities of the next-generation large models.
Automatic Construction of Code Agent Data
Automate the creation of correct and diverse coding competition/engineering problems and automate engineering environment configuration to provide data support for large-scale reinforcement learning of code agents.
Learning to Learn
Research aimed at model self-evolution, enabling the model to learn to acquire and process training data to improve itself.
Model
Model Reliable
Research on ensuring stable and efficient training during the scaling up of models. Analyze and solve the stability and efficiency issues in parameter optimization during the scaling process to maintain stable training and adhere to the scaling law effectively.
Long Context
Research on long context and combine it with deep research and reasoning to optimize the performance and efficiency of training and inference.
Model Structure
Study the structure of foundation models, such as MoE, residual connections, normalization, tokenization, and other algorithmic aspects to achieve higher efficiency in LLMs. Investigate how model structures impact the performance ceiling of large models.
Efficient
Cover multiple aspects, including computational efficiency (completing training and inference within limited computational resources and time), storage efficiency (occupying less GPU memory), and data utilization efficiency (learning more knowledge from limited data). This includes techniques such as quantization, engineering optimization of MFU, pruning, etc.

Selected Papers

May 21, 2025
MMaDA: Multimodal Large Diffusion Language Models
We introduce MMaDA, a novel class of multimodal diffusion foundation models designed to achieve superior performance across diverse domains such as textual reasoning, multimodal understanding, and text-to-image generation. The approach is distinguished by three key innovations: (i) MMaDA adopts a unified diffusion architecture with a shared probabilistic formulation and a modality-agnostic design, eliminating the need for modality-specific components. This architecture ensures seamless integration and processing across different data types. (ii) We implement a mixed long chain-of-thought (CoT) fine-tuning strategy that curates a unified CoT format across modalities. By aligning reasoning processes between textual and visual domains, this strategy facilitates cold-start training for the final reinforcement learning (RL) stage, thereby enhancing the model's ability to handle complex tasks from the outset. (iii) We propose UniGRPO, a unified policy-gradient-based RL algorithm specifically tailored for diffusion foundation models. Utilizing diversified reward modeling, UniGRPO unifies post-training across both reasoning and generation tasks, ensuring consistent performance improvements. Experimental results demonstrate that MMaDA-8B exhibits strong generalization capabilities as a unified multimodal foundation model. It surpasses powerful models like LLaMA-3-7B and Qwen2-7B in textual reasoning, outperforms Show-o and SEED-X in multimodal understanding, and excels over SDXL and Janus in text-to-image generation. These achievements highlight MMaDA's effectiveness in bridging the gap between pretraining and post-training within unified diffusion architectures, providing a comprehensive framework for future research and development. We open-source our code and trained models at: https://github.com/Gen-Verse/MMaDA
Ling Yang, Ye Tian, Bowen Li, Xinchen Zhang, Ke Shen, Yunhai Tong, Mengdi Wang
Computer Vision
May 17, 2025
Model Merging in Pre-training of Large Language Models
Model merging has emerged as a promising technique for enhancing large language models, though its application in large-scale pre-training remains relatively unexplored. In this paper, we present a comprehensive investigation of model merging techniques during the pre-training process. Through extensive experiments with both dense and Mixture-of-Experts (MoE) architectures ranging from millions to over 100 billion parameters, we demonstrate that merging checkpoints trained with constant learning rates not only achieves significant performance improvements but also enables accurate prediction of annealing behavior. These improvements lead to both more efficient model development and significantly lower training costs. Our detailed ablation studies on merging strategies and hyperparameters provide new insights into the underlying mechanisms while uncovering novel applications. Through comprehensive experimental analysis, we offer the open-source community practical pre-training guidelines for effective model merging.
Yunshui Li, Yiyuan Ma, Shen Yan, Chaoyi Zhang, Jing Liu, Jianqiao Lu, Ziwen Xu, Mengzhao Chen, Minrui Wang, Shiyi Zhan, Jin Ma, Xunhao Lai, Deyi Liu, Yao Luo, Xingyan Bin, Hongbin Ren, Mingji Han, Wenhao Hao, Bairen Yi, LingJun Liu, Bole Ma, Xiaoying Jia, Xun Zhou, Siyuan Qiao, Liang Xiang, Yonghui Wu
LLM
Apr 10, 2025
Seed-Thinking-v1.5: Advancing Superb Reasoning Models with Reinforcement Learning
We introduce Seed-Thinking-v1.5, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks. Seed-Thinking-v1.5 achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains. For instance, it surpasses DeepSeek R1 by 8% in win rate on non-reasoning tasks, indicating its broader applicability. Compared to other state-of-the-art reasoning models, Seed-Thinking-v1.5 is a Mixture-of-Experts (MoE) model with a relatively small size, featuring 20B activated and 200B total parameters. As part of our effort to assess generalized reasoning, we develop two internal benchmarks, BeyondAIME and Codeforces, both of which will be publicly released to support future research.
Jiaze Chen, TianTian Fan, Xin Liu, Lingjun Liu, Zhiqi Lin, Mingxuan Wang, Chengyi Wang, Xiangpeng Wei, Wenyuan Xu,Yufeng Yuan, Yu Yue, Lin Yan, Qiying Yu, Xiaochen Zuo, Chi Zhang
LLM
Learn More

Featured Jobs

Research Scientist in Large Language Model
San Jose/Seattle
Experienced Hiring
Apply Now
Research Scientist, Reinforcement Learning
San Jose/Seattle
Experienced Hiring
Apply Now
Research Scientist in LLM Foundation Models (reasoning, planning & agent), Seed, PhD Graduates- 2025 Start
San Jose/Seattle
Campus Recruitment
Apply Now
Research Scientist in Large Language, University Graduates (Search) - 2025 Start (PhD)
San Jose/Seattle
Campus Recruitment
Apply Now
Student Researcher in Foundation Models (Reasoning, Planning & Agent) - Seed - 2025 Start (PhD)
Seattle/San Jose
Internship
Apply Now
Student Researcher (Seed - LLM Post-training) - 2025 Start (PhD)
Seattle/San Jose
Internship
Apply Now