Robix：機器人交互、推理與規劃的統一模型

摘要

我們介紹了Robix，這是一個統一模型，將機器人推理、任務規劃和自然語言互動整合於單一的視覺-語言架構中。作為分層機器人系統中的高層認知層，Robix動態生成原子指令供低層控制器使用，並生成語言回應以實現人機互動，使機器人能夠在端到端框架內遵循複雜指令、規劃長期任務，並與人類自然互動。Robix進一步引入了新穎功能，如主動對話、實時中斷處理以及在任務執行期間的上下文感知常識推理。Robix的核心利用了思維鏈推理，並採用了三階段訓練策略：(1) 持續預訓練以增強基礎的具身推理能力，包括3D空間理解、視覺接地和任務中心推理；(2) 監督微調，將人機互動和任務規劃建模為統一的推理-行動序列；(3) 強化學習，以提高推理-行動的一致性和長期任務的連貫性。大量實驗表明，Robix在互動任務執行方面優於開源和商業基準（如GPT-4o和Gemini 2.5 Pro），展示了在各種指令類型（如開放式、多階段、約束性、無效和中斷）以及多種用戶參與任務（如餐桌清理、雜貨購物和飲食過濾）上的強大泛化能力。

English

We introduce Robix, a unified model that integrates robot reasoning, task planning, and natural language interaction within a single vision-language architecture. Acting as the high-level cognitive layer in a hierarchical robot system, Robix dynamically generates atomic commands for the low-level controller and verbal responses for human interaction, enabling robots to follow complex instructions, plan long-horizon tasks, and interact naturally with human within an end-to-end framework. Robix further introduces novel capabilities such as proactive dialogue, real-time interruption handling, and context-aware commonsense reasoning during task execution. At its core, Robix leverages chain-of-thought reasoning and adopts a three-stage training strategy: (1) continued pretraining to enhance foundational embodied reasoning abilities including 3D spatial understanding, visual grounding, and task-centric reasoning; (2) supervised finetuning to model human-robot interaction and task planning as a unified reasoning-action sequence; and (3) reinforcement learning to improve reasoning-action consistency and long-horizon task coherence. Extensive experiments demonstrate that Robix outperforms both open-source and commercial baselines (e.g., GPT-4o and Gemini 2.5 Pro) in interactive task execution, demonstrating strong generalization across diverse instruction types (e.g., open-ended, multi-stage, constrained, invalid, and interrupted) and various user-involved tasks such as table bussing, grocery shopping, and dietary filtering.

Robix：機器人交互、推理與規劃的統一模型

Robix: A Unified Model for Robot Interaction, Reasoning and Planning

摘要

Support