Infrastructures
The Seed Infrastructures team oversees the distributed training, reinforcement learning framework, high-performance inference, and heterogeneous hardware compilation technologies for AI foundation models.

Research topics

Ultra-large-scale training clusters
Study methods to improve the stability and model flops utilization (MFU) of large scale training clusters, including cross-cluster, low precision, fault tolerant and elastic training techniques.
Large-scale
Stability
Large-scale
Stability

Reinforcement learning systems
Research on end-to-end large model reinforcement learning systems, designing the next-generation RL systems under dynamic loads, complex agent/environment interactions, heterogeneous resources, and multimodal scenarios.
Reinforcement learning
Agent
Optimization
Reinforcement learning
Agent

Inference parallelization solutions
Research on overcoming compute and memory access bottlenecks during inference, including multi-node inference and parallel inference strategies on heterogeneous hardware.
Inference
Parallel
Inference
Parallel

Next-Generation Model and Hardware Co-Optimizatio
Research on advanced model architectures, training and inference paradigms by co-designing next-generation hardware systems with next-generation generative and understanding model architectures.
Systems-algorithm co-design
Model architecture
Systems-algorithm co-design
Model architecture

Compiler Optimization for Heterogeneous Hardware
Research on high-performance operator compilation and joint optimization of computation and communication for emerging hardware architectures.
Heterogeneous systems
Compiler
Heterogeneous systems
Compiler
Selected Papers

May 13, 2025
Seed1.5-VL Technical Report
We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluation suites, achieving the state-of-the-art performance on 38 out of 60 public benchmarks. Moreover, in agent-centric tasks such as GUI control and gameplay, Seed1.5-VL outperforms leading multimodal systems, including OpenAI CUA and Claude 3.7. Beyond visual and video understanding, it also demonstrates strong reasoning abilities, making it particularly effective for multimodal reasoning challenges such as visual puzzles. We believe these capabilities will empower broader applications across diverse tasks. In this report, we mainly provide a comprehensive review of our experiences in building Seed1.5-VL across model design, data construction, and training at various stages, hoping that this report can inspire further research. Seed1.5-VL is now accessible at this https URL (Volcano Engine Model ID: doubao-1-5-thinking-vision-pro-250428)
Seed Multimodal Team
LLM
2025.05.13
Seed1.5-VL Technical Report
Seed Multimodal Team
LLM

Apr 24, 2025
Let the Code LLM Edit Itself When You Edit the Code
In this work, we investigate a typical scenario in code generation where a developer edits existing code in real time and requests a code assistant, e.g., a large language model, to re-predict the next token or next line on the fly. Naively, the LLM needs to re-encode the entire KV cache to provide an accurate prediction. However, this process is computationally expensive, especially when the sequence length is long. Simply encoding the edited subsequence and integrating it to the original KV cache meets the temporal confusion problem, leading to significantly worse performance. We address this efficiency and accuracy trade-off by introducing \underline{\textbf{Positional \textbf{I}ntegrity \textbf{E}ncoding} (PIE). Building upon the rotary positional encoding, PIE first removes the rotary matrices in the Key cache that introduce temporal confusion and then reapplies the correct rotary matrices. This process ensures that positional relationships between tokens are correct and requires only a single round of matrix multiplication. We validate the effectiveness of PIE through extensive experiments on the RepoBench-C-8k dataset, utilizing DeepSeek-Coder models with 1.3B, 6.7B, and 33B parameters. Our evaluation includes three real-world coding tasks: code insertion, code deletion, and multi-place code editing. Results demonstrate that PIE reduces computational overhead by over 85% compared to the standard full recomputation approach across all model sizes and tasks while well approximating the model performance.
Zhenyu He, Jun Zhang, Shengjie Luo, Jingjing Xu, Zhi Zhang, Di He
NLP
2025.04.24
Let the Code LLM Edit Itself When You Edit the Code
Zhenyu He, Jun Zhang, Shengjie Luo, Jingjing Xu, Zhi Zhang, Di He
NLP

Apr 15, 2025
Seedream 3.0 Technical Report
We present Seedream 3.0, a high-performance Chinese-English bilingual image generation foundation model. We develop several technical improvements to address existing challenges in Seedream 2.0, including alignment with complicated prompts, fine-grained typography generation, suboptimal visual aesthetics and fidelity, and limited image resolutions. Specifically, the advancements of Seedream 3.0 stem from improvements across the entire pipeline, from data construction to model deployment. At the data stratum, we double the dataset using a defect-aware training paradigm and a dual-axis collaborative data-sampling framework. Furthermore, we adopt several effective techniques such as mixed-resolution training, cross-modality RoPE, representation alignment loss, and resolution-aware timestep sampling in the pre-training phase. During the post-training stage, we utilize diversified aesthetic captions in SFT, and a VLM-based reward model with scaling, thereby achieving outputs that well align with human preferences. Furthermore, Seedream 3.0 pioneers a novel acceleration paradigm. By employing consistent noise expectation and importance-aware timestep sampling, we achieve a 4 to 8 times speedup while maintaining image quality. Seedream 3.0 demonstrates significant improvements over Seedream 2.0: it enhances overall capabilities, in particular for text-rendering in complicated Chinese characters which is important to professional typography generation. In addition, it provides native high-resolution output (up to 2K), allowing it to generate images with high visual quality.
Seed Vision Team
Computer Vision
2025.04.15
Seedream 3.0 Technical Report
Seed Vision Team
Computer Vision
Learn More
Featured Jobs
Research Scientist in ML Systems
Software Engineer, ML System Architecture
Research Scientist, Applied Machine Learning
Software Engineer in Machine Learning Systems
Software Engineer Intern (Seed - Machine Learning System)
Research Scientist Intern (Seed - Machine Learning System)
Research Scientist in ML Systems
Seattle / San Jose
Experienced Hiring
Apply Now
Software Engineer, ML System Architecture
Seattle / San Jose
Experienced Hiring
Apply Now
Research Scientist, Applied Machine Learning
Seattle / San Jose
Campus Recruitment
Apply Now
Software Engineer in Machine Learning Systems
Seattle / San Jose
Campus Recruitment
Apply Now
Software Engineer Intern (Seed - Machine Learning System)
Seattle / San Jose
Internship
Apply Now
Research Scientist Intern (Seed - Machine Learning System)
Seattle / San Jose
Internship
Apply Now