學習適用於行動操作的通用特徵場

摘要

在移動操作中的一個開放問題是如何以統一的方式表示物體和場景，以便機器人既可以用於在環境中導航又可以用於操作物體。後者需要捕捉複雜的幾何形狀，同時理解細粒度的語義，而前者則涉及捕捉與廣泛物理尺度相關的複雜性。在這項工作中，我們提出了GeFF（通用特徵場），這是一種場景級通用的神經特徵場，可作為導航和操作的統一表示，實時執行。為此，我們將生成新視圖合成視為預訓練任務，然後通過CLIP特徵提煉將結果豐富的場景先驗與自然語言對齊。我們通過在配備機械手的四足機器人上部署GeFF，來展示此方法的有效性。我們評估GeFF在動態場景中進行開放詞彙移動操作時對開放集物體的泛化能力以及運行時間。

English

An open problem in mobile manipulation is how to represent objects and scenes in a unified manner, so that robots can use it both for navigating in the environment and manipulating objects. The latter requires capturing intricate geometry while understanding fine-grained semantics, whereas the former involves capturing the complexity inherit to an expansive physical scale. In this work, we present GeFF (Generalizable Feature Fields), a scene-level generalizable neural feature field that acts as a unified representation for both navigation and manipulation that performs in real-time. To do so, we treat generative novel view synthesis as a pre-training task, and then align the resulting rich scene priors with natural language via CLIP feature distillation. We demonstrate the effectiveness of this approach by deploying GeFF on a quadrupedal robot equipped with a manipulator. We evaluate GeFF's ability to generalize to open-set objects as well as running time, when performing open-vocabulary mobile manipulation in dynamic scenes.

學習適用於行動操作的通用特徵場

Learning Generalizable Feature Fields for Mobile Manipulation

摘要

Support