首页模型博客&论文加入我们

EN

中文

首页模型博客&论文加入我们

2025-06-12

Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts

上一篇下一篇

摘要

Diffusion models have emerged as mainstream framework in visual generation. Building upon this success, the integration of Mixture of Experts (MoE) methods has shown promise in enhancing model scalability and performance. In this paper, we introduce Race-DiT, a novel MoE model for diffusion transformers with a flexible routing strategy, Expert Race. By allowing tokens and experts to compete together and select the top candidates, the model learns to dynamically assign experts to critical tokens. Additionally, we propose per-layer regularization to address challenges in shallow layer learning, and router similarity loss to prevent mode collapse, ensuring better expert utilization. Extensive experiments on ImageNet validate the effectiveness of our approach, showcasing significant performance gains while promising scaling properties.

作者

Yike Yuan, Ziyu Wang, Zihao Huang, Defa Zhu, Xun Zhou, Jingyi Yu, Qiyang Min

期刊/会议

ICML 2025

模型成果

Seed2.0 Seedance 2.0 Seedream 5.0 Lite Seeduplex Seed GR-RL

研究团队

LLM Infrastructures Vision Speech Multimodal Interaction & World Model AI for Science Robotics Responsible AI

了解更多

博客 Seed Edge 校园招聘

模型成果

Seedream 5.0 Lite

研究团队

Infrastructures

Multimodal Interaction & World Model

了解更多

追求智能上限，创造社会价值

欢迎加入字节跳动 Seed

Copyright © 2026 Bytedance Seed

联系我们 : seed.feedback@bytedance.com

欢迎加入字节跳动 Seed

Copyright © 2026 Bytedance Seed