首页模型博客&论文 Seed Edge 加入我们

EN

中文

首页模型博客&论文 Seed Edge 加入我们

2025-07-24

Seed LiveInterpret 2.0: End-to-end Simultaneous Speech-to-speech Translation with Your Voice

上一篇下一篇

摘要

Simultaneous Interpretation (SI) represents one of the most daunting frontiers in the translation industry, with product-level automatic systems long plagued by intractable challenges: subpar transcription and translation quality, lack of real-time speech generation, multi-speaker confusion, and translated speech inflation, especially in long-form discourses. In this study, we introduce Seed-LiveInterpret 2.0, an end-to-end SI model that delivers high-fidelity, ultra-low-latency speech-to-speech generation with voice cloning capabilities. As a fully operational product-level solution, Seed-LiveInterpret 2.0 tackles these challenges head-on through our novel duplex speech-to-speech understanding-generating framework. Experimental results demonstrate that through large-scale pretraining and reinforcement learning, the model achieves a significantly better balance between translation accuracy and latency, validated by human interpreters to exceed 70% correctness in complex scenarios. Notably, Seed-LiveInterpret 2.0 outperforms commercial SI solutions by significant margins in translation quality, while slashing the average latency of cloned speech from nearly 10 seconds to a near-real-time 3 seconds, which is around a near 70% reduction that drastically enhances practical usability.

作者

Seed Speech Team

期刊/会议

arXiv

模型成果

Seed2.1 Seedance 2.0 Seedream 5.0 Pro Seeduplex Seed Audio 1.0 Seed GR-RL

研究团队

LLM Infrastructures Vision Speech Multimodal Interaction & World Model AI for Science Robotics Responsible AI

了解更多

博客 Seed Edge Seed STEM 科学家计划校园招聘

模型成果

Seedream 5.0 Pro

研究团队

Infrastructures

Multimodal Interaction & World Model

了解更多

Seed STEM 科学家计划

追求智能上限，创造社会价值

欢迎加入字节跳动 Seed

Copyright © 2026 Bytedance Seed

联系我们 : seed.feedback@bytedance.com

欢迎加入字节跳动 Seed

Copyright © 2026 Bytedance Seed