三维扩散策略

摘要

模仿学习为教授机器人灵巧技能提供了一种高效的方式；然而，学习复杂技能的鲁棒性和泛化性通常需要大量的人类演示。为了解决这一具有挑战性的问题，我们提出了3D扩散策略（DP3），这是一种新颖的视觉模仿学习方法，将3D视觉表示的强大融入到扩散策略中，这是一类条件动作生成模型。DP3的核心设计在于利用紧凑的3D视觉表示，从稀疏点云中提取，使用高效的点编码器。在我们涉及72个模拟任务的实验中，DP3仅使用10个演示就成功处理了大多数任务，并且相对基线方法有55.3%的相对改进。在4个真实机器人任务中，DP3表现出精确控制，成功率高达85%，每项任务仅需40个演示，并展现出在空间、视角、外观和实例等各个方面的出色泛化能力。有趣的是，在真实机器人实验中，DP3很少违反安全要求，而基线方法经常需要人类干预。我们的广泛评估突显了3D表示在现实世界机器人学习中的关键重要性。视频、代码和数据可在https://3d-diffusion-policy.github.io 上获取。

English

Imitation learning provides an efficient way to teach robots dexterous skills; however, learning complex skills robustly and generalizablely usually consumes large amounts of human demonstrations. To tackle this challenging problem, we present 3D Diffusion Policy (DP3), a novel visual imitation learning approach that incorporates the power of 3D visual representations into diffusion policies, a class of conditional action generative models. The core design of DP3 is the utilization of a compact 3D visual representation, extracted from sparse point clouds with an efficient point encoder. In our experiments involving 72 simulation tasks, DP3 successfully handles most tasks with just 10 demonstrations and surpasses baselines with a 55.3% relative improvement. In 4 real robot tasks, DP3 demonstrates precise control with a high success rate of 85%, given only 40 demonstrations of each task, and shows excellent generalization abilities in diverse aspects, including space, viewpoint, appearance, and instance. Interestingly, in real robot experiments, DP3 rarely violates safety requirements, in contrast to baseline methods which frequently do, necessitating human intervention. Our extensive evaluation highlights the critical importance of 3D representations in real-world robot learning. Videos, code, and data are available on https://3d-diffusion-policy.github.io .