3D擴散策略

摘要

模仿學為教導機器人熟練技能提供了一種有效的方法；然而，學習複雜技能並實現魯棒性和泛化性通常需要大量的人類示範。為應對這一具有挑戰性的問題，我們提出了3D擴散策略（DP3），這是一種新穎的視覺模仿學方法，將3D視覺表示的威力融入擴散策略中，這是一類條件動作生成模型。DP3的核心設計在於利用從稀疏點雲中提取的緊湊3D視覺表示，並使用高效的點編碼器。在我們的實驗中，涉及72個模擬任務，DP3僅通過10個示範就成功處理了大多數任務，並且相對於基準線有了55.3%的相對改進。在4個真實機器人任務中，DP3僅通過每個任務40個示範就展示出高成功率達85%的精確控制，並且在空間、視角、外觀和實例等各個方面展現出卓越的泛化能力。有趣的是，在真實機器人實驗中，DP3很少違反安全要求，而基準方法則經常需要人類干預。我們的廣泛評估突顯了3D表示在現實世界機器人學習中的關鍵重要性。視頻、代碼和數據可在https://3d-diffusion-policy.github.io 上獲得。

English

Imitation learning provides an efficient way to teach robots dexterous skills; however, learning complex skills robustly and generalizablely usually consumes large amounts of human demonstrations. To tackle this challenging problem, we present 3D Diffusion Policy (DP3), a novel visual imitation learning approach that incorporates the power of 3D visual representations into diffusion policies, a class of conditional action generative models. The core design of DP3 is the utilization of a compact 3D visual representation, extracted from sparse point clouds with an efficient point encoder. In our experiments involving 72 simulation tasks, DP3 successfully handles most tasks with just 10 demonstrations and surpasses baselines with a 55.3% relative improvement. In 4 real robot tasks, DP3 demonstrates precise control with a high success rate of 85%, given only 40 demonstrations of each task, and shows excellent generalization abilities in diverse aspects, including space, viewpoint, appearance, and instance. Interestingly, in real robot experiments, DP3 rarely violates safety requirements, in contrast to baseline methods which frequently do, necessitating human intervention. Our extensive evaluation highlights the critical importance of 3D representations in real-world robot learning. Videos, code, and data are available on https://3d-diffusion-policy.github.io .