ChatPaper.aiChatPaper

OK-Robot:在机器人技术中整合开放知识模型中真正重要的因素

OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics

January 22, 2024
作者: Peiqi Liu, Yaswanth Orru, Chris Paxton, Nur Muhammad Mahi Shafiullah, Lerrel Pinto
cs.AI

摘要

近年来,在视觉、语言和机器人领域取得了显著进展。我们现在拥有能够根据语言查询识别物体的视觉模型,能够有效控制移动系统的导航系统,以及能够处理各种物体的抓取模型。尽管取得了这些进展,但机器人的通用应用仍然落后,即使它们依赖于识别、导航和抓取等基本能力。在本文中,我们采用系统优先的方法开发了一种名为OK-Robot的新型基于开放知识的机器人框架。通过结合用于物体检测的视觉语言模型(VLMs)、用于移动的导航基元和用于物体操作的抓取基元,OK-Robot提供了一种集成解决方案,可进行拾取和放置操作而无需任何训练。为了评估其性能,我们在10个真实家庭环境中运行了OK-Robot。结果表明,OK-Robot在开放式拾取和放置任务中实现了58.5%的成功率,代表了开放词汇移动操作(OVMM)领域的最新技术水平,比以往工作的性能提高了近1.8倍。在更清洁、无杂乱的环境中,OK-Robot的性能提高到了82%。然而,从OK-Robot中获得的最重要见解是,在将VLMs等开放知识系统与机器人模块结合时,微妙细节的关键作用。我们的实验视频可在我们的网站上找到:https://ok-robot.github.io。
English
Remarkable progress has been made in recent years in the fields of vision, language, and robotics. We now have vision models capable of recognizing objects based on language queries, navigation systems that can effectively control mobile systems, and grasping models that can handle a wide range of objects. Despite these advancements, general-purpose applications of robotics still lag behind, even though they rely on these fundamental capabilities of recognition, navigation, and grasping. In this paper, we adopt a systems-first approach to develop a new Open Knowledge-based robotics framework called OK-Robot. By combining Vision-Language Models (VLMs) for object detection, navigation primitives for movement, and grasping primitives for object manipulation, OK-Robot offers a integrated solution for pick-and-drop operations without requiring any training. To evaluate its performance, we run OK-Robot in 10 real-world home environments. The results demonstrate that OK-Robot achieves a 58.5% success rate in open-ended pick-and-drop tasks, representing a new state-of-the-art in Open Vocabulary Mobile Manipulation (OVMM) with nearly 1.8x the performance of prior work. On cleaner, uncluttered environments, OK-Robot's performance increases to 82%. However, the most important insight gained from OK-Robot is the critical role of nuanced details when combining Open Knowledge systems like VLMs with robotic modules. Videos of our experiments are available on our website: https://ok-robot.github.io
PDF102December 15, 2024