CANVAS: 常識を持ったナビゲーションシステムによる直感的な人間とロボットの相互作用

要旨

現実世界のロボットナビゲーションは、目的地に到達するだけでなく、シナリオ固有の目標を達成するために動きを最適化する必要があります。人間がこれらの目標を表現する直感的な方法は、言葉の指示や大まかなスケッチなどの抽象的な手掛かりを通じて行われます。このような人間のガイダンスは詳細が不足しているか、ノイズが含まれている場合があります。それでも、ロボットには意図通りにナビゲーションすることが期待されています。ロボットが人間の期待に沿ってこれらの抽象的な指示を解釈し実行するためには、基本的なナビゲーション概念に関して人間と共通の理解を持つ必要があります。このために、視覚と言語の指示を組み合わせた常識を持ったナビゲーションのための革新的なフレームワークであるCANVASを紹介します。その成功は、ロボットが人間のナビゲーション行動から学習できる模倣学習によってもたらされます。私たちは、48時間以上219kmにわたる人間が注釈を付けたナビゲーション結果を含む包括的なデータセットであるCOMMANDを提供し、シミュレートされた環境で常識を持ったナビゲーションシステムを訓練することを設計しました。私たちの実験では、CANVASが強力なルールベースのシステムROS NavStackをすべての環境で上回り、ノイズのある指示でも優れたパフォーマンスを示すことが示されています。特に、ROS NavStackが全体的な成功率0％を記録する果樹園環境では、CANVASが全体的な成功率67％を達成しています。CANVASは、未知の環境でも人間のデモンストレーションや常識的な制約と密接に一致しています。さらに、CANVASの実世界展開は、模倣学習を通じてシミュレートされた環境での人間のデモンストレーションからの学習の潜在能力を示す、全体的な成功率69％の印象的なSim2Real転送を披露しています。

English

Real-life robot navigation involves more than just reaching a destination; it requires optimizing movements while addressing scenario-specific goals. An intuitive way for humans to express these goals is through abstract cues like verbal commands or rough sketches. Such human guidance may lack details or be noisy. Nonetheless, we expect robots to navigate as intended. For robots to interpret and execute these abstract instructions in line with human expectations, they must share a common understanding of basic navigation concepts with humans. To this end, we introduce CANVAS, a novel framework that combines visual and linguistic instructions for commonsense-aware navigation. Its success is driven by imitation learning, enabling the robot to learn from human navigation behavior. We present COMMAND, a comprehensive dataset with human-annotated navigation results, spanning over 48 hours and 219 km, designed to train commonsense-aware navigation systems in simulated environments. Our experiments show that CANVAS outperforms the strong rule-based system ROS NavStack across all environments, demonstrating superior performance with noisy instructions. Notably, in the orchard environment, where ROS NavStack records a 0% total success rate, CANVAS achieves a total success rate of 67%. CANVAS also closely aligns with human demonstrations and commonsense constraints, even in unseen environments. Furthermore, real-world deployment of CANVAS showcases impressive Sim2Real transfer with a total success rate of 69%, highlighting the potential of learning from human demonstrations in simulated environments for real-world applications.

CANVAS: 常識を持ったナビゲーションシステムによる直感的な人間とロボットの相互作用

CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction

要旨

Support