Vision
Seed-视觉团队致力于视觉生成的基础模型、多模态生成模型、以及基于生成式 AI 视觉基础问题的前沿科研和应用研发

课题方向

视觉生成基础模型
研发视觉生成(图像和视频)的基座模型,提供视觉生成高交互性和高可控性,理解视频中的视觉规律,探索基于生成基座模型的各种视觉任务
Multimodal
Diffusion Model
Auto Regression Model
Foundation
Multimodal
Diffusion Model

多模态生成模型
融合多种模态的统一生成模型,生成和理解联合建模,支持多模态的交织生成和同时生成(E.g. 数字人),提升生成模型上下文能力和一致性
Multimodel
Diffusion Model
Auto Regression Model
Foundation
Multimodel
Diffusion Model

3D/4D 生成模型
3D/4D 生成基础模型,从视频数据和 3D 数据学习视觉世界知识,理解物理世界 3D 空间和物理规律,构建视觉的空间智能和世界模型,探索基于生成模型的物理和渲染引擎
3D
4D
World Model
3D
4D

多模态模型设计和优化
多模态模型网络架构设计和优化、扩散模型的优化、高效的大规模分布式训练和推理、模型加速和优化
Multimodal
Optimization
Distillation
Quantization
Multimodal
Optimization
精选论文

2025.12.15
Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model
Recent strides in video generation have paved the way for unified audio-visual generation. In this work, we present Seedance 1.5 pro, a foundational model engineered specifically for native, joint audio-video generation. Leveraging a dual-branch Diffusion Transformer architecture, the model integrates a cross-modal joint module with a specialized multi-stage data pipeline, achieving exceptional audio-visual synchronization and superior generation quality. To ensure practical
utility, we implement meticulous post-training optimizations, including Supervised Fine-Tuning
(SFT) on high-quality datasets and Reinforcement Learning from Human Feedback (RLHF) with multi-dimensional reward models. Furthermore, we introduce an acceleration framework that boosts inference speed by over 10×. Seedance 1.5 pro distinguishes itself through precise
multilingual and dialect lip-syncing, dynamic cinematic camera control, and enhanced narrative coherence, positioning it as a robust engine for professional-grade content creation. Seedance 1.5 pro is now accessible on Volcano Engine.
Seed Vision Team
Computer Vision and Pattern Recognition
2025.12.15
Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model
Seed Vision Team
Computer Vision and Pattern Recognition

2025.06.11
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Notable advances in diffusion modeling have propelled rapid improvements in video generation, yet current foundational model still confront critical challenges in synergistically balancing prompt following, motion plausibility, and visual quality. In this report, we introduce Seedance 1.0, a high-performance and inference-efficient video foundation generation model that integrates several core technical improvements: (i) multi-source data curation augmented with precision and meaningful video captioning, enabling comprehensive learning across diverse scenarios; (ii) an efficient pre-training paradigm that enables multiple features or functions such as interleaved multimodal positional encoding, native multi-shot generation capacity, and multi-task modeling; (iii) carefully-designed post-training optimization leveraging fine-grained supervised fine-tuning, video-specific RLHF with multi-dimensional reward mechanisms for considerable performance improvements; (iv) excellent model acceleration achieving 10× inference speedup through multi- stage distillation strategies and system-level optimizations. Seedance 1.0 can generate a 5-second video at 1080p resolution only with 41.4 seconds. Compared to state-of-the-art video generation models, Seedance 1.0 stands out with high-quality and fast video generation with superior spatiotemporal fluidity with structural stability, precise instruction adherence in complex multi-subject contexts, native multi-shot narrative coherence with consistent subject representation, and ultra-fast inference.
Seed Vision Team
Computer Vision
2025.06.11
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Seed Vision Team
Computer Vision

2025.06.05
SeedEdit 3.0: Fast and High-Quality Generative Image Editing
We introduce SeedEdit 3.0, in companion with our T2I model Seedream 3.0, which significantly improves over our previous SeedEdit versions in both aspects of edit instruction following and image content (e.g., ID/IP) preservation on real image inputs. Additional to model upgrading with T2I, in this report, we present several key improvements. First, we develop an enhanced data curation pipeline with a meta-info paradigm and meta-info embedding strategy that help mix images from multiple data sources. This allows us to scale editing data effectively, and meta information is helpfult to connect VLM with diffusion model more closely. Second, we introduce a joint learning pipeline for computing a diffusion loss and reward losses. Finally, we evaluate SeedEdit 3.0 on our testing benchmarks, for real/synthetic image editing, where it achieves a best trade-off between multiple aspects, yielding a high usability rate of 56.1%, compared to SeedEdit 1.6 (38.4%), GPT4o (37.1%) and Gemini 2.0 (30.3%).
Peng Wang, Yichun Shi, Xiaochen Lian, Zhonghua Zhai, Xin Xia, Xuefeng Xiao, Weilin Huang, Jianchao Yang
Computer Vision
2025.06.05
SeedEdit 3.0: Fast and High-Quality Generative Image Editing
Peng Wang, Yichun Shi, Xiaochen Lian, Zhonghua Zhai, Xin Xia, Xuefeng Xiao, Weilin Huang, Jianchao Yang
Computer Vision
查看更多论文
技术能力展示
Seedance
Seedance 1.5 pro 是音视频联合生成模型,可精准响应复杂镜头指令,生成自然画面及配套叙事音频,支持音画一体快速成片。

SeedEdit
通用图像编辑模型。只需输入简单的自然语言,便可对图像进行多样化编辑操作,包括修图、换装、美化、风格转化以及在指定区域添加或删除元素等。

Seedream
Seedream 4.5 基于对模型整体的 scaling,实现了全方位的提升:在多图组合中精准识别并稳定锁定主体,最大化保持原图特征与细节质感,并进一步强化海报等密集文字的排版渲染能力,输出高一致性、高保真度的视觉作品。
热招岗位
AIGC算法专家-图像生成-Seed
AIGC算法专家-视频生成-Seed
3D生成算法工程师-Seed
视觉多模态算法研究实习生-Top Seed Intern
服务端开发实习生(数据)-Seed
AIGC模型优化实习生-Seed