ChatPaper.aiChatPaper

OK-Robot: 在整合開放知識模型至機器人中真正重要的事項

OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics

January 22, 2024
作者: Peiqi Liu, Yaswanth Orru, Chris Paxton, Nur Muhammad Mahi Shafiullah, Lerrel Pinto
cs.AI

摘要

近年來在視覺、語言和機器人領域取得了顯著進展。我們現在擁有能夠根據語言查詢識別物體的視覺模型,能夠有效控制移動系統的導航系統,以及能夠處理各種物體的抓取模型。儘管取得了這些進展,但機器人的通用應用仍然落後,即使它們依賴於識別、導航和抓取這些基本能力。在本文中,我們採用系統優先的方法來開發一個名為OK-Robot的新型基於開放知識的機器人框架。通過結合用於物體檢測的視覺-語言模型(VLMs)、用於移動的導航基元和用於物體操作的抓取基元,OK-Robot提供了一個集成解決方案,可進行拾取和放置操作而無需任何訓練。為了評估其性能,我們在10個真實家庭環境中運行了OK-Robot。結果表明,OK-Robot在開放式拾取和放置任務中實現了58.5%的成功率,代表了開放詞彙移動操作(OVMM)領域的最新技術水平,比以往工作的性能提高了近1.8倍。在更乾淨、沒有雜亂的環境中,OK-Robot的性能提高到了82%。然而,從OK-Robot中獲得的最重要見解是,在結合像VLMs這樣的開放知識系統與機器人模塊時,微妙細節的關鍵作用。我們的實驗視頻可在我們的網站上找到:https://ok-robot.github.io
English
Remarkable progress has been made in recent years in the fields of vision, language, and robotics. We now have vision models capable of recognizing objects based on language queries, navigation systems that can effectively control mobile systems, and grasping models that can handle a wide range of objects. Despite these advancements, general-purpose applications of robotics still lag behind, even though they rely on these fundamental capabilities of recognition, navigation, and grasping. In this paper, we adopt a systems-first approach to develop a new Open Knowledge-based robotics framework called OK-Robot. By combining Vision-Language Models (VLMs) for object detection, navigation primitives for movement, and grasping primitives for object manipulation, OK-Robot offers a integrated solution for pick-and-drop operations without requiring any training. To evaluate its performance, we run OK-Robot in 10 real-world home environments. The results demonstrate that OK-Robot achieves a 58.5% success rate in open-ended pick-and-drop tasks, representing a new state-of-the-art in Open Vocabulary Mobile Manipulation (OVMM) with nearly 1.8x the performance of prior work. On cleaner, uncluttered environments, OK-Robot's performance increases to 82%. However, the most important insight gained from OK-Robot is the critical role of nuanced details when combining Open Knowledge systems like VLMs with robotic modules. Videos of our experiments are available on our website: https://ok-robot.github.io
PDF102December 15, 2024