首页模型博客&论文加入我们
EN
中文
首页模型博客&论文加入我们

2025-06-20

Polybasic Speculative Decoding Through a Theoretical Perspective

Download PDF
上一篇下一篇

摘要

Inference latency stands as a critical bottleneck in the large-scale deployment of Large Language Models (LLMs). Speculative decoding methods have recently shown promise in accelerating inference without compromising the output distribution. However, existing work typically relies on a dualistic draft-verify framework and lacks rigorous theoretical grounding. In this paper, we introduce a novel polybasic speculative decoding framework, underpinned by a comprehensive theoretical analysis. Specifically, we prove a fundamental theorem that characterizes the optimal inference time for multi-model speculative decoding systems, shedding light on how to extend beyond the dualistic approach to a more general polybasic paradigm. Through our theoretical investigation of multi-model token generation, we expose and optimize the interplay between model capabilities, acceptance lengths, and overall computational cost. Our framework supports both standalone implementation and integration with existing speculative techniques, leading to accelerated performance in practice. Experimental results across multiple model families demonstrate that our approach yields speedup ratios ranging from 3.31× to 4.01× for LLaMA2-Chat 7B, up to 3.87× for LLaMA3-8B, up to 4.43× for Vicuna7B and up to 3.85× for Qwen2-7B—all while preserving the original output distribution. We release our theoretical proofs and implementation code to facilitate further investigation into polybasic speculative decoding.

作者

Ruilin Wang, Huixia Li, Yuexiao Ma, Xiawu Zheng, Fei Chao, Xuefeng Xiao, Rongrong Ji

期刊/会议

ICML 2025

模型成果
Seed2.0Seedance 2.0Seedream 5.0 LiteSeeduplexSeed GR-RL
研究团队
LLMInfrastructuresVisionSpeechMultimodal Interaction & World ModelAI for ScienceRoboticsResponsible AI
了解更多
博客Seed Edge校园招聘
模型成果
Seed2.0
Seedance 2.0
Seedream 5.0 Lite
Seeduplex
Seed GR-RL
研究团队
LLM
Infrastructures
Vision
Speech
Multimodal Interaction & World Model
AI for Science
Robotics
Responsible AI
了解更多
博客
Seed Edge
校园招聘
追求智能上限,创造社会价值
欢迎加入字节跳动 Seed
Copyright © 2026 Bytedance Seed
网站声明
联系我们 : seed.feedback@bytedance.com
欢迎加入字节跳动 Seed
Copyright © 2026 Bytedance Seed
网站声明