Vision
The Seed Vision team focuses on foundational models for visual generation, developing multimodal generative models, and carrying out leading research and application development to solve fundamental computer vision challenges in GenAI.

Research topics

Foundational models for visual generation
Researching and developing foundational models for visual generation (images and videos), ensuring high interactivity and controllability in visual generation, understanding patterns in videos, and exploring various visual-oriented tasks based on generative foundational models.
Multimodal
Diffusion Model
Auto Regression Model
Foundation
Multimodal
Diffusion Model

Multimodal generative models
Integrating various modalities into a unified generative model, generating and understanding joint modeling, supporting interleaved generation and simultaneous generation across various modalities (such as the digital avatar), and enhancing the contextual capabilities and consistency of generative models.
Multimodel
Diffusion Model
Auto Regression Model
Foundation
Multimodel
Diffusion Model

3D/4D generative models
3D/4D foundational generative models, learning visual world knowledge from video and 3D data, understanding the physical world's 3D space and physical laws, building spatial intelligence and world models, and exploring physics and rendering engines based on generative models.
3D
4D
World Model
3D
4D

Multimodal model design and optimization
Design and optimize multimodal model network architectures, optimize diffusion models, carry out efficient large-scale distributed training and inference, and push for model acceleration and optimization.
Multimodal
Optimization
Distillation
Quantization
Multimodal
Optimization
Selected Papers

Dec 15, 2025
Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model
Recent strides in video generation have paved the way for unified audio-visual generation. In this work, we present Seedance 1.5 pro, a foundational model engineered specifically for native, joint audio-video generation. Leveraging a dual-branch Diffusion Transformer architecture, the model integrates a cross-modal joint module with a specialized multi-stage data pipeline, achieving exceptional audio-visual synchronization and superior generation quality. To ensure practical
utility, we implement meticulous post-training optimizations, including Supervised Fine-Tuning
(SFT) on high-quality datasets and Reinforcement Learning from Human Feedback (RLHF) with multi-dimensional reward models. Furthermore, we introduce an acceleration framework that boosts inference speed by over 10×. Seedance 1.5 pro distinguishes itself through precise
multilingual and dialect lip-syncing, dynamic cinematic camera control, and enhanced narrative coherence, positioning it as a robust engine for professional-grade content creation. Seedance 1.5 pro is now accessible on Volcano Engine.
Seed Vision Team
Computer Vision and Pattern Recognition
2025.12.15
Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model
Seed Vision Team
Computer Vision and Pattern Recognition

Jun 11, 2025
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Notable advances in diffusion modeling have propelled rapid improvements in video generation, yet current foundational model still confront critical challenges in synergistically balancing prompt following, motion plausibility, and visual quality. In this report, we introduce Seedance 1.0, a high-performance and inference-efficient video foundation generation model that integrates several core technical improvements: (i) multi-source data curation augmented with precision and meaningful video captioning, enabling comprehensive learning across diverse scenarios; (ii) an efficient pre-training paradigm that enables multiple features or functions such as interleaved multimodal positional encoding, native multi-shot generation capacity, and multi-task modeling; (iii) carefully-designed post-training optimization leveraging fine-grained supervised fine-tuning, video-specific RLHF with multi-dimensional reward mechanisms for considerable performance improvements; (iv) excellent model acceleration achieving 10× inference speedup through multi- stage distillation strategies and system-level optimizations. Seedance 1.0 can generate a 5-second video at 1080p resolution only with 41.4 seconds. Compared to state-of-the-art video generation models, Seedance 1.0 stands out with high-quality and fast video generation with superior spatiotemporal fluidity with structural stability, precise instruction adherence in complex multi-subject contexts, native multi-shot narrative coherence with consistent subject representation, and ultra-fast inference.
Seed Vision Team
Computer Vision
2025.06.11
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Seed Vision Team
Computer Vision

Jun 05, 2025
SeedEdit 3.0: Fast and High-Quality Generative Image Editing
We introduce SeedEdit 3.0, in companion with our T2I model Seedream 3.0, which significantly improves over our previous SeedEdit versions in both aspects of edit instruction following and image content (e.g., ID/IP) preservation on real image inputs. Additional to model upgrading with T2I, in this report, we present several key improvements. First, we develop an enhanced data curation pipeline with a meta-info paradigm and meta-info embedding strategy that help mix images from multiple data sources. This allows us to scale editing data effectively, and meta information is helpfult to connect VLM with diffusion model more closely. Second, we introduce a joint learning pipeline for computing a diffusion loss and reward losses. Finally, we evaluate SeedEdit 3.0 on our testing benchmarks, for real/synthetic image editing, where it achieves a best trade-off between multiple aspects, yielding a high usability rate of 56.1%, compared to SeedEdit 1.6 (38.4%), GPT4o (37.1%) and Gemini 2.0 (30.3%).
Peng Wang, Yichun Shi, Xiaochen Lian, Zhonghua Zhai, Xin Xia, Xuefeng Xiao, Weilin Huang, Jianchao Yang
Computer Vision
2025.06.05
SeedEdit 3.0: Fast and High-Quality Generative Image Editing
Peng Wang, Yichun Shi, Xiaochen Lian, Zhonghua Zhai, Xin Xia, Xuefeng Xiao, Weilin Huang, Jianchao Yang
Computer Vision
Learn More
Technical capability demonstration
Seedance
Seedance 1.5 pro is a joint audio-video model that accurately follows complex instructions.

SeedEdit
A universal image editing model which enables diverse editing operations on images through simple natural language input. These operations include photo retouching, clothing replacement, beautification, style transformation, as well as adding or removing elements in specified regions.

Seedream
Seedream 4.5 achieves an all-round improvement through the overall scaling of the model. It accurately identifies the main subjects in multi-image editing, strictly preserves the details of the reference images, and further enhances the typography and dense text rendering capabilities, so that it can deliver professional visual creatives with high consistency and fidelity.
Featured Jobs
Research Scientist, Multimodal Foundation Model
Research Scientist- Foundation Model, Video Generation
Research Engineer- Foundation Model AI Platform- San Jose
Research Scientist Graduate (Foundation Model, Video Generation) - 2025 Start (PhD)
Student Researcher (Seed - Foundation Model, Video Generation) - 2025 Start (PhD)
Student Researcher (Seed - Foundation Model AI Platform) - 2025 Start (PhD)
Research Scientist, Multimodal Foundation Model
Singapore
Experienced Hiring
Apply Now
Research Scientist- Foundation Model, Video Generation
San Jose
Experienced Hiring
Apply Now
Research Engineer- Foundation Model AI Platform- San Jose
San Jose
Experienced Hiring
Apply Now
Research Scientist Graduate (Foundation Model, Video Generation) - 2025 Start (PhD)
San Jose
Campus Recruitment
Apply Now
Student Researcher (Seed - Foundation Model, Video Generation) - 2025 Start (PhD)
San Jose
Internship
Apply Now
Student Researcher (Seed - Foundation Model AI Platform) - 2025 Start (PhD)
San Jose
Internship
Apply Now