Vision
Seed-视觉团队致力于视觉生成的基础模型、多模态生成模型、以及基于生成式 AI 视觉基础问题的前沿科研和应用研发
课题方向
视觉生成基础模型
研发视觉生成(图像和视频)的基座模型,提供视觉生成高交互性和高可控性,理解视频中的视觉规律,探索基于生成基座模型的各种视觉任务
Multimodal
Diffusion Model
Auto Regression Model
Foundation
多模态生成模型
融合多种模态的统一生成模型,生成和理解联合建模,支持多模态的交织生成和同时生成(E.g. 数字人),提升生成模型上下文能力和一致性
Multimodel
Diffusion Model
Auto Regression Model
Foundation
3D/4D 生成模型
3D/4D 生成基础模型,从视频数据和 3D 数据学习视觉世界知识,理解物理世界 3D 空间和物理规律,构建视觉的空间智能和世界模型,探索基于生成模型的物理和渲染引擎
3D
4D
World Model
多模态模型设计和优化
多模态模型网络架构设计和优化、扩散模型的优化、高效的大规模分布式训练和推理、模型加速和优化
Multimodal
Optimization
Distillation
Quantization

精选论文

2025.06.11
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Notable advances in diffusion modeling have propelled rapid improvements in video generation, yet current foundational model still confront critical challenges in synergistically balancing prompt following, motion plausibility, and visual quality. In this report, we introduce Seedance 1.0, a high-performance and inference-efficient video foundation generation model that integrates several core technical improvements: (i) multi-source data curation augmented with precision and meaningful video captioning, enabling comprehensive learning across diverse scenarios; (ii) an efficient pre-training paradigm that enables multiple features or functions such as interleaved multimodal positional encoding, native multi-shot generation capacity, and multi-task modeling; (iii) carefully-designed post-training optimization leveraging fine-grained supervised fine-tuning, video-specific RLHF with multi-dimensional reward mechanisms for considerable performance improvements; (iv) excellent model acceleration achieving 10× inference speedup through multi- stage distillation strategies and system-level optimizations. Seedance 1.0 can generate a 5-second video at 1080p resolution only with 41.4 seconds. Compared to state-of-the-art video generation models, Seedance 1.0 stands out with high-quality and fast video generation with superior spatiotemporal fluidity with structural stability, precise instruction adherence in complex multi-subject contexts, native multi-shot narrative coherence with consistent subject representation, and ultra-fast inference.
Seed Vision Team
Computer Vision
2025.06.05
SeedEdit 3.0: Fast and High-Quality Generative Image Editing
We introduce SeedEdit 3.0, in companion with our T2I model Seedream 3.0, which significantly improves over our previous SeedEdit versions in both aspects of edit instruction following and image content (e.g., ID/IP) preservation on real image inputs. Additional to model upgrading with T2I, in this report, we present several key improvements. First, we develop an enhanced data curation pipeline with a meta-info paradigm and meta-info embedding strategy that help mix images from multiple data sources. This allows us to scale editing data effectively, and meta information is helpfult to connect VLM with diffusion model more closely. Second, we introduce a joint learning pipeline for computing a diffusion loss and reward losses. Finally, we evaluate SeedEdit 3.0 on our testing benchmarks, for real/synthetic image editing, where it achieves a best trade-off between multiple aspects, yielding a high usability rate of 56.1%, compared to SeedEdit 1.6 (38.4%), GPT4o (37.1%) and Gemini 2.0 (30.3%).
Peng Wang, Yichun Shi, Xiaochen Lian, Zhonghua Zhai, Xin Xia, Xuefeng Xiao, Weilin Huang, Jianchao Yang
Computer Vision
2025.04.15
Seedream 3.0 Technical Report
We present Seedream 3.0, a high-performance Chinese-English bilingual image generation foundation model. We develop several technical improvements to address existing challenges in Seedream 2.0, including alignment with complicated prompts, fine-grained typography generation, suboptimal visual aesthetics and fidelity, and limited image resolutions. Specifically, the advancements of Seedream 3.0 stem from improvements across the entire pipeline, from data construction to model deployment. At the data stratum, we double the dataset using a defect-aware training paradigm and a dual-axis collaborative data-sampling framework. Furthermore, we adopt several effective techniques such as mixed-resolution training, cross-modality RoPE, representation alignment loss, and resolution-aware timestep sampling in the pre-training phase. During the post-training stage, we utilize diversified aesthetic captions in SFT, and a VLM-based reward model with scaling, thereby achieving outputs that well align with human preferences. Furthermore, Seedream 3.0 pioneers a novel acceleration paradigm. By employing consistent noise expectation and importance-aware timestep sampling, we achieve a 4 to 8 times speedup while maintaining image quality. Seedream 3.0 demonstrates significant improvements over Seedream 2.0: it enhances overall capabilities, in particular for text-rendering in complicated Chinese characters which is important to professional typography generation. In addition, it provides native high-resolution output (up to 2K), allowing it to generate images with high visual quality.
Seed Vision Team
Computer Vision
查看更多论文
技术能力展示
Seedance
视频生成基础模型。能精准理解复杂指令,生成动作流畅、细节丰富的 1080p 视频,并原生支持连贯的多镜头叙事。
SeedEdit
通用图像编辑模型。只需输入简单的自然语言,便可对图像进行多样化编辑操作,包括修图、换装、美化、风格转化以及在指定区域添加或删除元素等。
Seedream
图像生成基础模型。原生高分辨率,支持中英双语,在生图结构准确性、数量准确性、多物体属性关系、小字生成与排版、美感效果、真实度等方面表现出色。

热招岗位

AIGC算法专家-图像生成-Seed
北京/上海/深圳/杭州
社招
立即投递
AIGC算法专家-视频生成-Seed
北京/上海/深圳/杭州
社招
立即投递
3D生成算法工程师-Seed
北京/上海/深圳/杭州
社招
立即投递
视觉多模态算法研究实习生-Top Seed Intern
北京/上海/深圳/杭州
实习
立即投递
服务端开发实习生(数据)-Seed
北京
实习
立即投递
AIGC模型优化实习生-Seed
北京/深圳
实习
立即投递