首页模型博客&论文加入我们
EN
中文
首页模型博客&论文加入我们

2025-12-15

Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

Download PDF
上一篇下一篇

摘要

Recent strides in video generation have paved the way for unified audio-visual generation. In this work, we present Seedance 1.5 pro, a foundational model engineered specifically for native, joint audio-video generation. Leveraging a dual-branch Diffusion Transformer architecture, the model integrates a cross-modal joint module with a specialized multi-stage data pipeline, achieving exceptional audio-visual synchronization and superior generation quality. To ensure practical utility, we implement meticulous post-training optimizations, including Supervised Fine-Tuning (SFT) on high-quality datasets and Reinforcement Learning from Human Feedback (RLHF) with multi-dimensional reward models. Furthermore, we introduce an acceleration framework that boosts inference speed by over 10×. Seedance 1.5 pro distinguishes itself through precise multilingual and dialect lip-syncing, dynamic cinematic camera control, and enhanced narrative coherence, positioning it as a robust engine for professional-grade content creation. Seedance 1.5 pro is now accessible on Volcano Engine.

作者

Seed Vision Team

模型成果
Seed1.8Seed1.5-VLSeedance 1.5 proSeedream 4.5Seed LiveInterpret 2.0Seed Realtime VoiceSeed Music
研究团队
LLMInfrastructuresVisionSpeechMultimodal Interaction & World ModelAI for ScienceRoboticsResponsible AI
了解更多
模型研究加入我们Top SeedSeed Edge
模型成果
Seed1.8
Seed1.5-VL
Seedance 1.5 pro
Seedream 4.5
Seed LiveInterpret 2.0
Seed Realtime Voice
Seed Music
研究团队
LLM
Infrastructures
Vision
Speech
Multimodal Interaction & World Model
AI for Science
Robotics
Responsible AI
了解更多
模型
研究
加入我们
Top Seed
Seed Edge
追求智能上限,创造社会价值
欢迎加入字节跳动 Seed
Copyright © 2026 Bytedance Seed
网站声明
联系我们 : seed.feedback@bytedance.com
欢迎加入字节跳动 Seed
Copyright © 2026 Bytedance Seed
网站声明