GOAT: 任意の対象へ移動

要旨

家庭や倉庫などの実環境での展開において、モバイルロボットは長期間にわたって自律的にナビゲーションを行い、人間のオペレーターが直感的に理解できる形で表現されたタスクをシームレスに実行することが期待されています。本論文では、これらの要件に対応可能な汎用ナビゲーションシステム「GO To Any Thing（GOAT）」を提案します。GOATは以下の3つの主要な特徴を備えています：a) マルチモーダル：カテゴリラベル、ターゲット画像、言語記述など多様な形式で指定された目標に対応可能、b) ライフロング：同一環境での過去の経験を活用可能、c) プラットフォーム非依存：異なる形態のロボットに迅速に導入可能。GOATは、モジュール型システム設計と、カテゴリレベルの意味情報に加えて異なる視点からの物体の外観を追跡する継続的に拡張されるインスタンス認識型セマンティックメモリによって実現されています。これにより、GOATは同一カテゴリ内の異なるインスタンスを区別し、画像や言語記述で指定されたターゲットへのナビゲーションを可能にします。9つの異なる家庭環境で90時間以上にわたる実験比較において、200以上の異なる物体インスタンスから選ばれた675の目標に対して、GOATは83%の総合成績率を達成し、従来手法やアブレーション研究を32%（絶対値）上回りました。GOATは環境での経験を重ねることで性能が向上し、最初の目標では60%の成功率でしたが、探索後には90%の成功率に達しました。さらに、GOATがピックアンドプレースやソーシャルナビゲーションなどの下流タスクにも容易に適用可能であることを実証しました。

English

In deployment scenarios such as homes and warehouses, mobile robots are expected to autonomously navigate for extended periods, seamlessly executing tasks articulated in terms that are intuitively understandable by human operators. We present GO To Any Thing (GOAT), a universal navigation system capable of tackling these requirements with three key features: a) Multimodal: it can tackle goals specified via category labels, target images, and language descriptions, b) Lifelong: it benefits from its past experience in the same environment, and c) Platform Agnostic: it can be quickly deployed on robots with different embodiments. GOAT is made possible through a modular system design and a continually augmented instance-aware semantic memory that keeps track of the appearance of objects from different viewpoints in addition to category-level semantics. This enables GOAT to distinguish between different instances of the same category to enable navigation to targets specified by images and language descriptions. In experimental comparisons spanning over 90 hours in 9 different homes consisting of 675 goals selected across 200+ different object instances, we find GOAT achieves an overall success rate of 83%, surpassing previous methods and ablations by 32% (absolute improvement). GOAT improves with experience in the environment, from a 60% success rate at the first goal to a 90% success after exploration. In addition, we demonstrate that GOAT can readily be applied to downstream tasks such as pick and place and social navigation.

GOAT: 任意の対象へ移動

GOAT: GO to Any Thing

要旨

Support