DynaMem: オープンワールド移動操作のためのオンライン動的時空間意味メモリ

要旨

オープン語彙モバイルマニピュレーションにおいて、ロボットが自然言語の記述に基づいて任意の環境でタスクを実行することを目指す研究が大きく進展している。しかし、現在のほとんどのシステムは静的な環境を前提としており、人間の介入やロボット自身の動作によって環境が頻繁に変化する現実世界のシナリオでの適用性が制限されている。本研究では、動的時空間意味メモリを使用してロボットの環境を表現する、オープンワールドモバイルマニピュレーションの新しいアプローチであるDynaMemを提案する。DynaMemは、ポイントクラウドの動的メモリを維持するための3Dデータ構造を構築し、マルチモーダルLLMや最先端の視覚言語モデルによって生成されたオープン語彙特徴を使用して、オープン語彙オブジェクト位置特定クエリに応答する。DynaMemを活用することで、ロボットは新しい環境を探索し、メモリ内に見つからないオブジェクトを検索し、シーン内でオブジェクトが移動、出現、または消失する際にメモリを継続的に更新することができる。Stretch SE3ロボットを使用して、3つの実環境と9つのオフラインシーンで広範な実験を行い、非静止オブジェクトに対する平均ピックアンドドロップ成功率70%を達成した。これは、最先端の静的システムと比較して2倍以上の改善である。私たちのコードおよび実験と展開のビデオはオープンソース化されており、プロジェクトウェブサイト（https://dynamem.github.io/）で確認できる。

English

Significant progress has been made in open-vocabulary mobile manipulation, where the goal is for a robot to perform tasks in any environment given a natural language description. However, most current systems assume a static environment, which limits the system's applicability in real-world scenarios where environments frequently change due to human intervention or the robot's own actions. In this work, we present DynaMem, a new approach to open-world mobile manipulation that uses a dynamic spatio-semantic memory to represent a robot's environment. DynaMem constructs a 3D data structure to maintain a dynamic memory of point clouds, and answers open-vocabulary object localization queries using multimodal LLMs or open-vocabulary features generated by state-of-the-art vision-language models. Powered by DynaMem, our robots can explore novel environments, search for objects not found in memory, and continuously update the memory as objects move, appear, or disappear in the scene. We run extensive experiments on the Stretch SE3 robots in three real and nine offline scenes, and achieve an average pick-and-drop success rate of 70% on non-stationary objects, which is more than a 2x improvement over state-of-the-art static systems. Our code as well as our experiment and deployment videos are open sourced and can be found on our project website: https://dynamem.github.io/

DynaMem: オープンワールド移動操作のためのオンライン動的時空間意味メモリ

DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation

要旨

Support