GNFactor：具有通用神經特徵場的多任務真實機器人學習

摘要

在機器人學中，開發能夠從視覺觀察中在非結構化真實世界環境中執行多樣化操作任務的代理人一直是一個長期存在的問題。為了實現這一目標，機器人需要對場景的三維結構和語義有全面的理解。在這項工作中，我們提出了GNFactor，這是一個利用通用神經特徵場域進行多任務機器人操作的視覺行為克隆代理人。GNFactor同時優化通用神經場域（GNF）作為重建模塊和Perceiver Transformer作為決策模塊，利用共享的深度三維體素表示。為了在三維中融入語義，重建模塊利用視覺語言基礎模型（例如，穩定擴散）將豐富的語義信息提煉到深度三維體素中。我們在3個真實機器人任務上評估了GNFactor，並對10個RLBench任務進行了詳細的消融分析，其中僅使用有限數量的示範。我們觀察到，GNFactor在已知和未知任務中明顯優於當前最先進的方法，展示了GNFactor強大的泛化能力。我們的項目網站是https://yanjieze.com/GNFactor/。

English

It is a long-standing problem in robotics to develop agents capable of executing diverse manipulation tasks from visual observations in unstructured real-world environments. To achieve this goal, the robot needs to have a comprehensive understanding of the 3D structure and semantics of the scene. In this work, we present GNFactor, a visual behavior cloning agent for multi-task robotic manipulation with Generalizable Neural feature Fields. GNFactor jointly optimizes a generalizable neural field (GNF) as a reconstruction module and a Perceiver Transformer as a decision-making module, leveraging a shared deep 3D voxel representation. To incorporate semantics in 3D, the reconstruction module utilizes a vision-language foundation model (e.g., Stable Diffusion) to distill rich semantic information into the deep 3D voxel. We evaluate GNFactor on 3 real robot tasks and perform detailed ablations on 10 RLBench tasks with a limited number of demonstrations. We observe a substantial improvement of GNFactor over current state-of-the-art methods in seen and unseen tasks, demonstrating the strong generalization ability of GNFactor. Our project website is https://yanjieze.com/GNFactor/ .

GNFactor：具有通用神經特徵場的多任務真實機器人學習

GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields

摘要

Support