ChatPaper.aiChatPaper

RVT-2:從少量示範中學習精準操控

RVT-2: Learning Precise Manipulation from Few Demonstrations

June 12, 2024
作者: Ankit Goyal, Valts Blukis, Jie Xu, Yijie Guo, Yu-Wei Chao, Dieter Fox
cs.AI

摘要

在這項工作中,我們研究如何建立一個機器人系統,能夠根據語言指示解決多個3D操作任務。為了在工業和家庭領域有所用途,這樣的系統應該能夠在少數示範中學習新任務並精確解決它們。之前的研究,如PerAct和RVT,已經研究過這個問題,然而,它們常常在需要高精度的任務上遇到困難。我們研究如何使它們更有效、更精確和更快速。通過結構和系統級別的改進結合,我們提出了RVT-2,一個多任務3D操作模型,其訓練速度提高了6倍,推論速度提高了2倍,比其前身RVT更快。RVT-2在RLBench上實現了新的最先進水平,將成功率從65%提高到82%。RVT-2在現實世界中也表現出色,它可以僅通過10次示範學習需要高精度的任務,如拾取和插入插頭。視覺結果、代碼和訓練模型可在以下網址找到:https://robotic-view-transformer-2.github.io/。
English
In this work, we study how to build a robotic system that can solve multiple 3D manipulation tasks given language instructions. To be useful in industrial and household domains, such a system should be capable of learning new tasks with few demonstrations and solving them precisely. Prior works, like PerAct and RVT, have studied this problem, however, they often struggle with tasks requiring high precision. We study how to make them more effective, precise, and fast. Using a combination of architectural and system-level improvements, we propose RVT-2, a multitask 3D manipulation model that is 6X faster in training and 2X faster in inference than its predecessor RVT. RVT-2 achieves a new state-of-the-art on RLBench, improving the success rate from 65% to 82%. RVT-2 is also effective in the real world, where it can learn tasks requiring high precision, like picking up and inserting plugs, with just 10 demonstrations. Visual results, code, and trained model are provided at: https://robotic-view-transformer-2.github.io/.

Summary

AI-Generated Summary

PDF71December 6, 2024