移动操作中学习通用特征场

摘要

移动操作中的一个悬而未决的问题是如何以统一的方式表示物体和场景，以便机器人既可以用于在环境中导航又可以用于操作物体。后者需要捕捉复杂的几何形状，同时理解细粒度的语义，而前者则涉及捕捉与广阔物理尺度相关的复杂性。在这项工作中，我们提出了GeFF（通用特征场），这是一个场景级通用的神经特征场，可作为导航和操作的统一表示，在实时性能方面表现出色。为此，我们将生成新视图合成视为一个预训练任务，然后通过CLIP特征蒸馏将生成的丰富场景先验与自然语言进行对齐。我们通过在配备有操作器的四足机器人上部署GeFF来展示这种方法的有效性。我们评估了GeFF在动态场景中进行开放词汇移动操作时对开放集物体的泛化能力以及运行时间。

English

An open problem in mobile manipulation is how to represent objects and scenes in a unified manner, so that robots can use it both for navigating in the environment and manipulating objects. The latter requires capturing intricate geometry while understanding fine-grained semantics, whereas the former involves capturing the complexity inherit to an expansive physical scale. In this work, we present GeFF (Generalizable Feature Fields), a scene-level generalizable neural feature field that acts as a unified representation for both navigation and manipulation that performs in real-time. To do so, we treat generative novel view synthesis as a pre-training task, and then align the resulting rich scene priors with natural language via CLIP feature distillation. We demonstrate the effectiveness of this approach by deploying GeFF on a quadrupedal robot equipped with a manipulator. We evaluate GeFF's ability to generalize to open-set objects as well as running time, when performing open-vocabulary mobile manipulation in dynamic scenes.

移动操作中学习通用特征场

Learning Generalizable Feature Fields for Mobile Manipulation

摘要

Support