Infrastructures
Seed-Infrastructures 团队负责大模型的分布式训练、强化学习框架、高性能推理、异构硬件编译器等工作

课题方向

超大规模分布式训练
研究超大规模训练集群,如何让训练的稳定性和 MFU 提升,跨集群、低精度、容错及弹性训练
Large-scale
Stability
Large-scale
Stability

强化学习系统
研究端到端的大模型强化系统,在动态负载、复杂 Agent/环境交互、异构资源、多模态场景下设计下一代系统
Reinforcement learning
Agent
Optimization
Reinforcement learning
Agent

推理并行方案
研究如何解决推理的计算和访存瓶颈,多机推理,异构硬件的并行推理方案和调度优化
Inference
Parallel
Inference
Parallel

下一代模型与硬件体系联合优化
结合下一代硬件体系和下一代生成理解模型架构,研究更先进的模型结构、训练模式、推理模式
Systems-algorithm co-design
Model architecture
Systems-algorithm co-design
Model architecture

异构硬件编译优化
研究新硬件体系结构下高性能算子的编译优化、计算通讯联合优化
Heterogeneous systems
Compiler
Heterogeneous systems
Compiler
精选论文

2025.06.11
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Notable advances in diffusion modeling have propelled rapid improvements in video generation, yet current foundational model still confront critical challenges in synergistically balancing prompt following, motion plausibility, and visual quality. In this report, we introduce Seedance 1.0, a high-performance and inference-efficient video foundation generation model that integrates several core technical improvements: (i) multi-source data curation augmented with precision and meaningful video captioning, enabling comprehensive learning across diverse scenarios; (ii) an efficient pre-training paradigm that enables multiple features or functions such as interleaved multimodal positional encoding, native multi-shot generation capacity, and multi-task modeling; (iii) carefully-designed post-training optimization leveraging fine-grained supervised fine-tuning, video-specific RLHF with multi-dimensional reward mechanisms for considerable performance improvements; (iv) excellent model acceleration achieving 10× inference speedup through multi- stage distillation strategies and system-level optimizations. Seedance 1.0 can generate a 5-second video at 1080p resolution only with 41.4 seconds. Compared to state-of-the-art video generation models, Seedance 1.0 stands out with high-quality and fast video generation with superior spatiotemporal fluidity with structural stability, precise instruction adherence in complex multi-subject contexts, native multi-shot narrative coherence with consistent subject representation, and ultra-fast inference.
Seed Vision Team
Computer Vision
2025.06.11
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Seed Vision Team
Computer Vision

2025.06.05
SeedEdit 3.0: Fast and High-Quality Generative Image Editing
We introduce SeedEdit 3.0, in companion with our T2I model Seedream 3.0, which significantly improves over our previous SeedEdit versions in both aspects of edit instruction following and image content (e.g., ID/IP) preservation on real image inputs. Additional to model upgrading with T2I, in this report, we present several key improvements. First, we develop an enhanced data curation pipeline with a meta-info paradigm and meta-info embedding strategy that help mix images from multiple data sources. This allows us to scale editing data effectively, and meta information is helpfult to connect VLM with diffusion model more closely. Second, we introduce a joint learning pipeline for computing a diffusion loss and reward losses. Finally, we evaluate SeedEdit 3.0 on our testing benchmarks, for real/synthetic image editing, where it achieves a best trade-off between multiple aspects, yielding a high usability rate of 56.1%, compared to SeedEdit 1.6 (38.4%), GPT4o (37.1%) and Gemini 2.0 (30.3%).
Peng Wang, Yichun Shi, Xiaochen Lian, Zhonghua Zhai, Xin Xia, Xuefeng Xiao, Weilin Huang, Jianchao Yang
Computer Vision
2025.06.05
SeedEdit 3.0: Fast and High-Quality Generative Image Editing
Peng Wang, Yichun Shi, Xiaochen Lian, Zhonghua Zhai, Xin Xia, Xuefeng Xiao, Weilin Huang, Jianchao Yang
Computer Vision

2025.05.21
MMaDA: Multimodal Large Diffusion Language Models
We introduce MMaDA, a novel class of multimodal diffusion foundation models designed to achieve superior performance across diverse domains such as textual reasoning, multimodal understanding, and text-to-image generation. The approach is distinguished by three key innovations: (i) MMaDA adopts a unified diffusion architecture with a shared probabilistic formulation and a modality-agnostic design, eliminating the need for modality-specific components. This architecture ensures seamless integration and processing across different data types. (ii) We implement a mixed long chain-of-thought (CoT) fine-tuning strategy that curates a unified CoT format across modalities. By aligning reasoning processes between textual and visual domains, this strategy facilitates cold-start training for the final reinforcement learning (RL) stage, thereby enhancing the model's ability to handle complex tasks from the outset. (iii) We propose UniGRPO, a unified policy-gradient-based RL algorithm specifically tailored for diffusion foundation models. Utilizing diversified reward modeling, UniGRPO unifies post-training across both reasoning and generation tasks, ensuring consistent performance improvements. Experimental results demonstrate that MMaDA-8B exhibits strong generalization capabilities as a unified multimodal foundation model. It surpasses powerful models like LLaMA-3-7B and Qwen2-7B in textual reasoning, outperforms Show-o and SEED-X in multimodal understanding, and excels over SDXL and Janus in text-to-image generation. These achievements highlight MMaDA's effectiveness in bridging the gap between pretraining and post-training within unified diffusion architectures, providing a comprehensive framework for future research and development. We open-source our code and trained models at:
https://github.com/Gen-Verse/MMaDA
Ling Yang, Ye Tian, Bowen Li, Xinchen Zhang, Ke Shen, Yunhai Tong, Mengdi Wang
Computer Vision
2025.05.21
MMaDA: Multimodal Large Diffusion Language Models
Ling Yang, Ye Tian, Bowen Li, Xinchen Zhang, Ke Shen, Yunhai Tong, Mengdi Wang
Computer Vision
查看更多论文
热招岗位
机器学习训练框架研发工程师/专家-Seed
机器学习系统推理引擎资深工程师/专家-Seed
机器学习系统调度工程师/专家-Seed
大模型推理存储系统工程师/专家-Seed
AI异构计算优化工程师/专家-Seed
机器学习系统研发实习生-Seed