OK-Robot: ロボティクスにおけるオープン知識モデル統合で本当に重要なこと

要旨

近年、視覚、言語、ロボティクスの分野で目覚ましい進展が見られています。現在では、言語クエリに基づいて物体を認識できる視覚モデル、モバイルシステムを効果的に制御できるナビゲーションシステム、多様な物体を扱える把持モデルが存在します。しかし、これらの認識、ナビゲーション、把持といった基本的な能力に依存しているにもかかわらず、汎用ロボティクスアプリケーションの開発はまだ遅れを取っています。本論文では、システムファーストのアプローチを採用し、新たなOpen Knowledgeベースのロボティクスフレームワーク「OK-Robot」を開発します。OK-Robotは、物体検出のためのVision-Language Models (VLMs)、移動のためのナビゲーションプリミティブ、物体操作のための把持プリミティブを組み合わせることで、トレーニングを必要としないピックアンドドロップ操作の統合ソリューションを提供します。その性能を評価するため、OK-Robotを10の実世界の家庭環境で実行しました。結果は、OK-Robotがオープンエンドのピックアンドドロップタスクで58.5%の成功率を達成し、Open Vocabulary Mobile Manipulation (OVMM)において新たな最先端を記録し、従来の研究の約1.8倍の性能を示しました。より整理された環境では、OK-Robotの性能は82%に向上します。しかし、OK-Robotから得られた最も重要な洞察は、VLMsのようなOpen Knowledgeシステムとロボティックモジュールを組み合わせる際の微妙な詳細の重要性です。実験のビデオは当社のウェブサイトでご覧いただけます: https://ok-robot.github.io

English

Remarkable progress has been made in recent years in the fields of vision, language, and robotics. We now have vision models capable of recognizing objects based on language queries, navigation systems that can effectively control mobile systems, and grasping models that can handle a wide range of objects. Despite these advancements, general-purpose applications of robotics still lag behind, even though they rely on these fundamental capabilities of recognition, navigation, and grasping. In this paper, we adopt a systems-first approach to develop a new Open Knowledge-based robotics framework called OK-Robot. By combining Vision-Language Models (VLMs) for object detection, navigation primitives for movement, and grasping primitives for object manipulation, OK-Robot offers a integrated solution for pick-and-drop operations without requiring any training. To evaluate its performance, we run OK-Robot in 10 real-world home environments. The results demonstrate that OK-Robot achieves a 58.5% success rate in open-ended pick-and-drop tasks, representing a new state-of-the-art in Open Vocabulary Mobile Manipulation (OVMM) with nearly 1.8x the performance of prior work. On cleaner, uncluttered environments, OK-Robot's performance increases to 82%. However, the most important insight gained from OK-Robot is the critical role of nuanced details when combining Open Knowledge systems like VLMs with robotic modules. Videos of our experiments are available on our website: https://ok-robot.github.io

OK-Robot: ロボティクスにおけるオープン知識モデル統合で本当に重要なこと

OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics

要旨

Support