Robix：机器人交互、推理与规划的统一模型

摘要

我们推出Robix，这是一个统一模型，将机器人推理、任务规划和自然语言交互集成于单一视觉-语言架构中。作为分层机器人系统中的高层认知模块，Robix动态生成原子指令供底层控制器执行，同时产生语言响应以支持人机交互，使机器人能够在端到端框架下遵循复杂指令、规划长期任务，并与人类自然互动。Robix进一步引入了新颖功能，如主动对话、实时中断处理以及在任务执行过程中的情境感知常识推理。其核心在于利用思维链推理，并采用三阶段训练策略：（1）持续预训练，以增强包括三维空间理解、视觉定位和任务中心推理在内的基础具身推理能力；（2）监督微调，将人机交互和任务规划建模为统一的推理-行动序列；（3）强化学习，以提高推理-行动的一致性和长期任务的连贯性。大量实验表明，Robix在交互式任务执行上超越了开源和商业基线模型（如GPT-4o和Gemini 2.5 Pro），展现出对多种指令类型（如开放式、多阶段、受限、无效和中断）以及涉及用户的各种任务（如餐桌清理、购物和饮食筛选）的强大泛化能力。

English

We introduce Robix, a unified model that integrates robot reasoning, task planning, and natural language interaction within a single vision-language architecture. Acting as the high-level cognitive layer in a hierarchical robot system, Robix dynamically generates atomic commands for the low-level controller and verbal responses for human interaction, enabling robots to follow complex instructions, plan long-horizon tasks, and interact naturally with human within an end-to-end framework. Robix further introduces novel capabilities such as proactive dialogue, real-time interruption handling, and context-aware commonsense reasoning during task execution. At its core, Robix leverages chain-of-thought reasoning and adopts a three-stage training strategy: (1) continued pretraining to enhance foundational embodied reasoning abilities including 3D spatial understanding, visual grounding, and task-centric reasoning; (2) supervised finetuning to model human-robot interaction and task planning as a unified reasoning-action sequence; and (3) reinforcement learning to improve reasoning-action consistency and long-horizon task coherence. Extensive experiments demonstrate that Robix outperforms both open-source and commercial baselines (e.g., GPT-4o and Gemini 2.5 Pro) in interactive task execution, demonstrating strong generalization across diverse instruction types (e.g., open-ended, multi-stage, constrained, invalid, and interrupted) and various user-involved tasks such as table bussing, grocery shopping, and dietary filtering.

Robix：机器人交互、推理与规划的统一模型

Robix: A Unified Model for Robot Interaction, Reasoning and Planning

摘要

Support