DynaMem：用于开放世界移动操作的在线动态空间语义记忆

摘要

在开放词汇移动操作方面取得了重大进展，其目标是使机器人能够根据自然语言描述在任何环境中执行任务。然而，大多数当前系统假定环境是静态的，这限制了系统在现实世界场景中的适用性，因为环境经常因人类干预或机器人自身行为而发生变化。在这项工作中，我们提出了DynaMem，这是一种用于开放世界移动操作的新方法，它使用动态空间语义记忆来表示机器人的环境。DynaMem构建了一个3D数据结构，以维护点云的动态记忆，并使用多模态LLM或由最先进的视觉语言模型生成的开放词汇特征来回答开放词汇对象定位查询。借助DynaMem的支持，我们的机器人可以探索新环境，在记忆中搜索未找到的对象，并在场景中的物体移动、出现或消失时持续更新记忆。我们在三个真实场景和九个离线场景中对Stretch SE3机器人进行了大量实验，对非静止物体的平均抓取和放置成功率达到了70％，这比最先进的静态系统提高了2倍以上。我们的代码以及实验和部署视频均已开源，并可在我们的项目网站上找到：https://dynamem.github.io/

English

Significant progress has been made in open-vocabulary mobile manipulation, where the goal is for a robot to perform tasks in any environment given a natural language description. However, most current systems assume a static environment, which limits the system's applicability in real-world scenarios where environments frequently change due to human intervention or the robot's own actions. In this work, we present DynaMem, a new approach to open-world mobile manipulation that uses a dynamic spatio-semantic memory to represent a robot's environment. DynaMem constructs a 3D data structure to maintain a dynamic memory of point clouds, and answers open-vocabulary object localization queries using multimodal LLMs or open-vocabulary features generated by state-of-the-art vision-language models. Powered by DynaMem, our robots can explore novel environments, search for objects not found in memory, and continuously update the memory as objects move, appear, or disappear in the scene. We run extensive experiments on the Stretch SE3 robots in three real and nine offline scenes, and achieve an average pick-and-drop success rate of 70% on non-stationary objects, which is more than a 2x improvement over state-of-the-art static systems. Our code as well as our experiment and deployment videos are open sourced and can be found on our project website: https://dynamem.github.io/

DynaMem：用于开放世界移动操作的在线动态空间语义记忆

DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation

摘要

Support