GNFactor：具有通用神经特征场的多任务真实机器人学习

摘要

在机器人技术中，一个长期存在的问题是开发能够从视觉观察中在非结构化真实环境中执行多样操纵任务的智能体。为了实现这一目标，机器人需要对场景的三维结构和语义有全面的理解。在这项工作中，我们提出了GNFactor，这是一个用于多任务机器人操纵的视觉行为克隆智能体，具有可泛化的神经特征场。GNFactor同时优化一个可泛化的神经场（GNF）作为重建模块，以及一个Perceiver Transformer作为决策模块，利用共享的深度三维体素表示。为了在三维中融入语义，重建模块利用一个视觉-语言基础模型（例如，稳定扩散）将丰富的语义信息提炼到深度三维体素中。我们在3个真实机器人任务上评估了GNFactor，并对10个RLBench任务进行了详细的消融实验，仅使用有限数量的演示。我们观察到，在已见和未见任务中，GNFactor相对于当前最先进的方法有显著改进，展示了GNFactor强大的泛化能力。我们的项目网站是https://yanjieze.com/GNFactor/。

English

It is a long-standing problem in robotics to develop agents capable of executing diverse manipulation tasks from visual observations in unstructured real-world environments. To achieve this goal, the robot needs to have a comprehensive understanding of the 3D structure and semantics of the scene. In this work, we present GNFactor, a visual behavior cloning agent for multi-task robotic manipulation with Generalizable Neural feature Fields. GNFactor jointly optimizes a generalizable neural field (GNF) as a reconstruction module and a Perceiver Transformer as a decision-making module, leveraging a shared deep 3D voxel representation. To incorporate semantics in 3D, the reconstruction module utilizes a vision-language foundation model (e.g., Stable Diffusion) to distill rich semantic information into the deep 3D voxel. We evaluate GNFactor on 3 real robot tasks and perform detailed ablations on 10 RLBench tasks with a limited number of demonstrations. We observe a substantial improvement of GNFactor over current state-of-the-art methods in seen and unseen tasks, demonstrating the strong generalization ability of GNFactor. Our project website is https://yanjieze.com/GNFactor/ .

GNFactor：具有通用神经特征场的多任务真实机器人学习

GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields

摘要

Support