ObjectReact: 視覚ナビゲーションのための物体相対制御の学習

要旨

単一カメラとトポロジカルマップのみを使用した視覚ナビゲーションは、追加のセンサーや3Dマップを必要とする手法に比べて、最近注目を集める代替手段となっています。これは通常、現在の観測画像とサブゴール画像のペアから制御を推定する「画像相対的」アプローチによって実現されます。しかし、世界を画像レベルで表現することには限界があります。なぜなら、画像はエージェントの姿勢と実装に厳密に結びついているからです。対照的に、オブジェクトはマップの特性であり、実装や軌道に依存しない世界表現を提供します。本研究では、いくつかの望ましい特性を示す「オブジェクト相対的」制御を学習する新しいパラダイムを提案します。a) 新しいルートを、事前の経験を厳密に模倣することなく通過できる、b) 制御予測問題を画像マッチング問題から切り離すことができる、c) トレーニング-テストおよびマッピング-実行設定の両方における変動に対して、クロスエンボディメント展開において高い不変性を達成できる。我々は、「相対的」3Dシーングラフの形式でトポメトリックマップ表現を提案し、これを使用してより情報量の多いオブジェクトレベルのグローバル経路計画コストを取得します。ローカルコントローラー「ObjectReact」を、明示的なRGB入力を必要としない高レベルの「WayObject Costmap」表現に直接条件付けてトレーニングします。センサー高さの変動や、基礎となる空間理解能力に挑戦する複数のナビゲーションタスク（例えば、マップ軌道を逆方向にナビゲートする）において、画像相対的アプローチに対するオブジェクト相対的制御の利点を実証します。さらに、シミュレーションのみのポリシーが実世界の屋内環境にうまく一般化できることを示します。コードと補足資料はプロジェクトページからアクセス可能です: https://object-react.github.io/

English

Visual navigation using only a single camera and a topological map has recently become an appealing alternative to methods that require additional sensors and 3D maps. This is typically achieved through an "image-relative" approach to estimating control from a given pair of current observation and subgoal image. However, image-level representations of the world have limitations because images are strictly tied to the agent's pose and embodiment. In contrast, objects, being a property of the map, offer an embodiment- and trajectory-invariant world representation. In this work, we present a new paradigm of learning "object-relative" control that exhibits several desirable characteristics: a) new routes can be traversed without strictly requiring to imitate prior experience, b) the control prediction problem can be decoupled from solving the image matching problem, and c) high invariance can be achieved in cross-embodiment deployment for variations across both training-testing and mapping-execution settings. We propose a topometric map representation in the form of a "relative" 3D scene graph, which is used to obtain more informative object-level global path planning costs. We train a local controller, dubbed "ObjectReact", conditioned directly on a high-level "WayObject Costmap" representation that eliminates the need for an explicit RGB input. We demonstrate the advantages of learning object-relative control over its image-relative counterpart across sensor height variations and multiple navigation tasks that challenge the underlying spatial understanding capability, e.g., navigating a map trajectory in the reverse direction. We further show that our sim-only policy is able to generalize well to real-world indoor environments. Code and supplementary material are accessible via project page: https://object-react.github.io/

ObjectReact: 視覚ナビゲーションのための物体相対制御の学習

ObjectReact: Learning Object-Relative Control for Visual Navigation

要旨

Support