LLM
The Seed LLM team is dedicated to aggressively advancing the next generation of LLMs, tackling fundamental challenges in LLM development head-on. Our areas of focus include model pretraining, posttraining, inference, memory capabilities, learning, interpretability and other related directions. We dive deep into the latest technologies and create comprehensive solutions from concept to completion. In our endeavor to adopt LLMs in real-life scenarios, we persistently seek out methods to enhance applications through technological innovation.
Main areas of focus
Horizon
We're a team dedicated to cutting-edge research, driven by a mission to push the boundaries of model intelligence, and fuelled by a long-term vision and unwavering commitment. We are seeking passionate and self-driven researchers who share our vision to collaborate on LLM research.
Pretrain
The team is dedicated to developing the next generation of pretraining paradigms, ensuring that the scaling law can still be effectively continued even with several orders of magnitude increase in computing power, thereby enhancing the upper limit of general intelligence. At the same time, we focus on AI for math, solving genuinely challenging and valuable mathematical problems, while advancing research in the field for humanity's benefit.
Posttrain
The team is responsible for comprehensive work on LLM posttraining and provides core posttraining foundational technologies for large multimodal unified models. The team's goal is to research and explore next-generation leading technologies in the posttraining phase, such as SFT, RM, RL, and self-learning, while making significant optimizations and improvements in key areas like reasoning, coding, and agent.
Research topics
Horizon
Limits of the Long CoT Model
Explore the limits of long reasoning models, continuously expanding from the perspective of inference-time scaling and model scaling, with the objective of solving complex problems that humans cannot yet address.
O-model Architecture
The scaling law of inference dimension is a key to achieving ultimate intelligence. We aim to develop a lifelong learning intelligent system, enabling models to possess reasoning capability with linear complexity.
Memory
Establish a streaming memory mechanism that can manage the context of unlimited length and truly achieve online learning, such as learning to code by reading algorithms and learning a new language by reading a grammar book.
Agent's Reasoning and Planning Capabilities
Committed to solving the core and fundamental problems in the field of agent, building a super intelligence system to accelerate the leapfrog development in natural sciences, economic production, and daily life, and comprehensively enhancing societal efficiency and the quality of human life.
Pretrain
Next-Generation Pretraining Paradigm
Explore model self-training and evolution based on agent and active learning; explore large-scale synthetic data pretraining to break through the bottleneck and boundaries of human data; explore multimodal joint pretraining and better modeling methods to enhance the upper limits of intelligence.
High-capability Ultra-small Models
Research on activating and achieving high inference capabilities with 1B model, as well as new methods for supporting data and modeling.
AI for Math
The long-term vision is to enable AI to automatically or assist mathematicians in solving truly challenging and valuable mathematical propositions, such as the Riemann Hypothesis; by effectively combining NL and FL, we can explore a new paradigm for the next generation of provers with higher upper limits.
Posttrain
Large Scale Reinforcement Learning
Address the issue of large-scale RL scaling, enhance model capabilities, and align with human preferences.
Reward Model System
Integrate model, verifier, tool, and agent to provide accurate and generalized signals for data selection, synthesis, and RL training.
The Generalization of o1 / Long CoT in General Domains
Enable o1-level reasoning capabilities to break through the boundaries of math and code, and reach the level of human experts in more fields.
Long Horizon Task / Agent
Address long-distance, multi-turn modeling in long horizon task/agent, enabling models to truly solve complex problems in the human world.
Model OOD Solution Capability
Enable models to have excellent problem-solving capabilities across all types of problems.
Next Generation of RM/RL Algorithms
Explore new RM/RL algorithms that can overcome the current limitations.
Data Mining and Synthesis
Mine scarce posttraining data in the public domain and use posttraining technology to synthesize data based on seed data, further improving the upper limit of model capabilities.

Selected Papers

Apr 01, 2025
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?
The rapid escalation from elementary school-level to frontier problems of the difficulty for LLM benchmarks in recent years have weaved a miracle for researchers that we are only inches away from surpassing human intelligence. However, is the LLMs' remarkable reasoning ability indeed comes from true intelligence by human standards, or are they simply reciting solutions witnessed during training at an Internet level? To study this problem, we propose RoR-Bench, a novel, multi-modal benchmark for detecting LLM's recitation behavior when asked simple reasoning problems but with conditions subtly shifted, and conduct empirical analysis on our benchmark. Surprisingly, we found existing cutting-edge LLMs unanimously exhibits extremely severe recitation behavior; by changing one phrase in the condition, top models such as OpenAI-o1 and DeepSeek-R1 can suffer 60%.
Kai Yan, Yufei Xu, Zhengyin Du, Xuesong Yao, Zheyu Wang, Xiaowen Guo, Jiecao Chen
LLM
Mar 18, 2025
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Inference scaling empowers LLMs with unprecedented reasoning ability, with reinforcement learning as the core technique to elicit complex reasoning. However, key technical details of state-of-the-art reasoning LLMs are concealed (such as in OpenAI o1 blog and DeepSeek R1 technical report), thus the community still struggles to reproduce their RL training results.
Qiying Yu, Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, Yu Yue, Tiantian Fan, Gaohong Liu, Lingjun Liu, Xin Liu, Haibin Lin, Zhiqi Lin, Bole Ma, Guangming Sheng, Yuxuan Tong, Chi Zhang, Mofan Zhang, Wang Zhang, Hang Zhu, Jinhua Zhu, Jiaze Chen, Jiangjie Chen, Chengyi Wang, Hongli Yu, Weinan Dai, Yuxuan Song, Xiangpeng Wei, Hao Zhou, Jingjing Liu, Wei-Ying Ma, Ya-Qin Zhang, Lin Yan, Mu Qiao, Yonghui Wu, Mingxuan Wang
Reinforcement Learning
Mar 03, 2025
The Rise and Down of Babel Tower: Investigating the Evolution Process of Multilingual Code Large Language Model
Large language models (LLMs) have shown significant multilingual capabilities. However, the mechanisms underlying the development of these capabilities during pre-training are not well understood. In this paper, we use code LLMs as an experimental platform to explore the evolution of multilingual capabilities in LLMs during the pre-training process. Based on our observations, we propose the Babel Tower Hypothesis, which describes the entire process of LLMs acquiring new language capabilities. During the learning process, multiple languages initially share a single knowledge system dominated by the primary language and gradually develop language-specific knowledge systems. We then validate the above hypothesis by tracking the internal states of the LLMs through identifying working languages and language transferring neurons. Experimental results show that the internal state changes of the LLM are consistent with our Babel Tower Hypothesis. Building on these insights, we propose a novel method to construct an optimized pre-training corpus for multilingual code LLMs, which significantly outperforms LLMs trained on the original corpus. The proposed Babel Tower Hypothesis provides new insights into designing pre-training data distributions to achieve optimal multilingual capabilities in LLMs.
Jiawei Chen, Wentao Chen, Jing Su, Jingjing Xu, Hongyu Lin, Mengjie Ren, Yaojie Lu, Xianpei Han, Le Sun
LLM
Learn More

Featured Jobs

Research Scientist in Large Language Model
San Jose/Seattle
Experienced Hiring
Apply Now
Research Scientist, Reinforcement Learning
San Jose/Seattle
Experienced Hiring
Apply Now
Research Scientist in LLM Foundation Models (reasoning, planning & agent), Doubao, PhD Graduates- 2025 Start
San Jose/Seattle
Campus Recruitment
Apply Now
Research Scientist in Large Language, University Graduates (Search) - 2024 Start (PhD)
San Jose/Seattle
Campus Recruitment
Apply Now
Student Researcher in Foundation Models (Reasoning, Planning & Agent) - Doubao (Seed) - 2025 Start (PhD)
Seattle/San Jose
Internship
Apply Now
Student Researcher (Doubao (Seed) - LLM Post-training) - 2025 Start (PhD)
Seattle/San Jose
Internship
Apply Now