DiffusionDrive: 端から端までの自律運転のための切り捨てられた拡散モデル

要旨

最近、拡散モデルがロボットのポリシー学習において強力な生成技術として台頭し、複数モードの行動分布をモデリングする能力を持っています。その能力を活用してエンドツーエンドの自律走行に向けた方向性は有望です。ただし、ロボットの拡散ポリシーにおける多数のノイズ除去ステップと、よりダイナミックでオープンワールドな交通シーンの性質は、リアルタイムの速度で多様な運転アクションを生成する際に重大な課題を提起します。これらの課題に対処するために、私たちは新しい切り詰められた拡散ポリシーを提案します。このポリシーは事前の複数モードアンカーを組み込み、拡散スケジュールを切り詰めることで、モデルがアンカー付きガウス分布から複数モードの運転アクション分布へのノイズ除去を学習できるようにします。さらに、条件付きシーンコンテキストとの相互作用を強化する効率的なカスケード拡散デコーダを設計します。提案されたモデルであるDiffusionDriveは、バニラ拡散ポリシーに比べてノイズ除去ステップを10倍削減し、わずか2ステップで優れた多様性と品質を提供します。計画志向のNAVSIMデータセットにおいて、ResNet-34バックボーンを用いたDiffusionDriveは、装飾を施さずに88.1 PDMSを達成し、新記録を樹立します。また、NVIDIA 4090上で45 FPSのリアルタイム速度で実行されます。困難なシナリオに対する質的結果は、DiffusionDriveが堅牢に多様な運転アクションを生成できることをさらに確認しています。コードとモデルはhttps://github.com/hustvl/DiffusionDrive で入手可能です。

English

Recently, the diffusion model has emerged as a powerful generative technique for robotic policy learning, capable of modeling multi-mode action distributions. Leveraging its capability for end-to-end autonomous driving is a promising direction. However, the numerous denoising steps in the robotic diffusion policy and the more dynamic, open-world nature of traffic scenes pose substantial challenges for generating diverse driving actions at a real-time speed. To address these challenges, we propose a novel truncated diffusion policy that incorporates prior multi-mode anchors and truncates the diffusion schedule, enabling the model to learn denoising from anchored Gaussian distribution to the multi-mode driving action distribution. Additionally, we design an efficient cascade diffusion decoder for enhanced interaction with conditional scene context. The proposed model, DiffusionDrive, demonstrates 10times reduction in denoising steps compared to vanilla diffusion policy, delivering superior diversity and quality in just 2 steps. On the planning-oriented NAVSIM dataset, with the aligned ResNet-34 backbone, DiffusionDrive achieves 88.1 PDMS without bells and whistles, setting a new record, while running at a real-time speed of 45 FPS on an NVIDIA 4090. Qualitative results on challenging scenarios further confirm that DiffusionDrive can robustly generate diverse plausible driving actions. Code and model will be available at https://github.com/hustvl/DiffusionDrive.

DiffusionDrive: 端から端までの自律運転のための切り捨てられた拡散モデル

DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving

要旨

Support