HomeModelsBlog & PublicationJoin Us
EN
中文
HomeModelsBlog & PublicationJoin Us

2025-09-01

Robix: A Unified Model for Robot Interaction, Reasoning and Planning

Download PDF
PreviousNext

ABSTRACT

We introduce Robix, a unified vision-language model designed to serve as the high-level cognitive layer in a hierarchical robot system, integrating robot reasoning, task planning, and natural language interaction within a single architecture. Robix dynamically generates atomic commands for low-level controllers alongside verbal responses for human interaction, enabling end-to-end execution of complex instructions, long-horizon task planning, and natural human-robot collaboration. The model also introduces novel capabilities such as proactive dialogue, real-time interruption handling, and context-aware commonsense reasoning during task execution. At its core, Robix employs chain-of-thought reasoning and is trained through a three-stage strategy: (1) continued pretraining to enhance embodied reasoning skills like 3D spatial understanding, visual grounding, and task-centric reasoning; (2) supervised finetuning to model human-robot interaction and task planning as a unified reasoning-action sequence; and (3) reinforcement learning to improve reasoning-action consistency and long-horizon task coherence. Extensive experiments show that Robix outperforms both open-source and commercial baselines—including GPT-4o and Gemini 2.5 Pro—in interactive task execution, demonstrating strong generalization across diverse instruction types (e.g., open-ended, multi-stage, constrained, invalid, and interrupted) and various user-involved tasks such as table bussing, grocery shopping, and dietary filtering.

AUTHORS

Huang Fang, Mengxi Zhang, Heng Dong, Wei Li, Zixuan Wang, Qifeng Zhang, Xueyun Tian, Yucheng Hu, Hang Li

VENUE

arXiv

Models
Seed2.0Seedance 2.0Seedream 5.0 LiteSeed Realtime VoiceSeed GR-RL
Teams
LLMInfrastructuresVisionSpeechMultimodal Interaction & World ModelAI for ScienceRoboticsResponsible AI
Learn More
BlogSeed EdgeSeed Campus Recruitment
Models
Seed2.0
Seedance 2.0
Seedream 5.0 Lite
Seed Realtime Voice
Seed GR-RL
Teams
LLM
Infrastructures
Vision
Speech
Multimodal Interaction & World Model
AI for Science
Robotics
Responsible AI
Learn More
Blog
Seed Edge
Seed Campus Recruitment
Advancing the frontier of intelligence, in service of humanity
Join ByteDance Seed
Copyright © 2026 Bytedance Seed
Disclaimer
Contact us : seed.feedback@bytedance.com
Join ByteDance Seed
Copyright © 2026 Bytedance Seed
Disclaimer