DynaMem：用於開放世界移動操作的動態空間語義記憶。

摘要

在開放詞彙的移動操作方面取得了顯著進展，其目標是讓機器人在任何環境中執行任務，只需提供自然語言描述。然而，大多數當前系統假設環境是靜態的，這限制了系統在現實情況下的應用，因為環境經常因人為干預或機器人自身行為而發生變化。在這項工作中，我們提出了DynaMem，這是一種應用動態空間語義記憶來表示機器人環境的新方法。DynaMem構建了一個三維數據結構來維護點雲的動態記憶，並使用多模態LLMs或最先進的視覺語言模型生成的開放詞彙特徵來回答開放詞彙對象定位查詢。在DynaMem的支持下，我們的機器人可以探索新環境，在記憶中尋找未找到的對象，並在場景中的對象移動、出現或消失時持續更新記憶。我們在三個真實場景和九個離線場景中對Stretch SE3機器人進行了大量實驗，在非靜態對象上實現了70%的平均拾取和放置成功率，這比最先進的靜態系統提高了2倍以上。我們的代碼以及實驗和部署視頻是開源的，可以在我們的項目網站上找到：https://dynamem.github.io/

English

Significant progress has been made in open-vocabulary mobile manipulation, where the goal is for a robot to perform tasks in any environment given a natural language description. However, most current systems assume a static environment, which limits the system's applicability in real-world scenarios where environments frequently change due to human intervention or the robot's own actions. In this work, we present DynaMem, a new approach to open-world mobile manipulation that uses a dynamic spatio-semantic memory to represent a robot's environment. DynaMem constructs a 3D data structure to maintain a dynamic memory of point clouds, and answers open-vocabulary object localization queries using multimodal LLMs or open-vocabulary features generated by state-of-the-art vision-language models. Powered by DynaMem, our robots can explore novel environments, search for objects not found in memory, and continuously update the memory as objects move, appear, or disappear in the scene. We run extensive experiments on the Stretch SE3 robots in three real and nine offline scenes, and achieve an average pick-and-drop success rate of 70% on non-stationary objects, which is more than a 2x improvement over state-of-the-art static systems. Our code as well as our experiment and deployment videos are open sourced and can be found on our project website: https://dynamem.github.io/