3D拡散ポリシー

要旨

模倣学習は、ロボットに器用なスキルを教える効率的な方法を提供します。しかし、複雑なスキルをロバストかつ汎用的に学習するためには、通常、大量の人間によるデモンストレーションが必要となります。この難しい問題に取り組むため、我々は3D Diffusion Policy（DP3）を提案します。これは、3D視覚表現の力を拡散ポリシー（条件付き行動生成モデルの一種）に組み込んだ新しい視覚模倣学習アプローチです。DP3の核心的な設計は、効率的なポイントエンコーダを用いてスパースな点群から抽出されたコンパクトな3D視覚表現を活用することにあります。72のシミュレーションタスクを含む実験では、DP3はわずか10回のデモンストレーションでほとんどのタスクを成功裏に処理し、ベースラインを55.3%の相対的改善で上回りました。4つの実ロボットタスクでは、各タスク40回のデモンストレーションのみで85%の高い成功率で精密な制御を示し、空間、視点、外観、インスタンスなど多様な側面で優れた汎化能力を示しました。興味深いことに、実ロボット実験では、DP3は安全要件をほとんど違反しませんでしたが、ベースライン手法は頻繁に違反し、人間の介入を必要としました。我々の広範な評価は、実世界のロボット学習における3D表現の重要性を強調しています。ビデオ、コード、データはhttps://3d-diffusion-policy.github.ioで公開されています。

English

Imitation learning provides an efficient way to teach robots dexterous skills; however, learning complex skills robustly and generalizablely usually consumes large amounts of human demonstrations. To tackle this challenging problem, we present 3D Diffusion Policy (DP3), a novel visual imitation learning approach that incorporates the power of 3D visual representations into diffusion policies, a class of conditional action generative models. The core design of DP3 is the utilization of a compact 3D visual representation, extracted from sparse point clouds with an efficient point encoder. In our experiments involving 72 simulation tasks, DP3 successfully handles most tasks with just 10 demonstrations and surpasses baselines with a 55.3% relative improvement. In 4 real robot tasks, DP3 demonstrates precise control with a high success rate of 85%, given only 40 demonstrations of each task, and shows excellent generalization abilities in diverse aspects, including space, viewpoint, appearance, and instance. Interestingly, in real robot experiments, DP3 rarely violates safety requirements, in contrast to baseline methods which frequently do, necessitating human intervention. Our extensive evaluation highlights the critical importance of 3D representations in real-world robot learning. Videos, code, and data are available on https://3d-diffusion-policy.github.io .