RAD: 大規模な3DGSベースの強化学習によるエンドツーエンド運転ポリシーのトレーニング

要旨

既存のエンドツーエンド自動運転（AD）アルゴリズムは、通常、模倣学習（IL）のパラダイムに従っていますが、これには因果関係の混乱やオープンループギャップといった課題が存在します。本研究では、3DGSベースのクローズドループ強化学習（RL）トレーニングパラダイムを確立します。3DGS技術を活用することで、現実の物理世界をフォトリアルなデジタルレプリカとして構築し、ADポリシーが状態空間を広範に探索し、大規模な試行錯誤を通じて分布外シナリオに対処する方法を学習できるようにします。安全性を向上させるために、ポリシーが安全クリティカルなイベントに効果的に対応し、現実世界の因果関係を理解するよう導く特別な報酬を設計します。人間の運転行動との整合性を高めるため、ILをRLトレーニングに正則化項として組み込みます。多様な未見の3DGS環境からなるクローズドループ評価ベンチマークを導入します。ILベースの手法と比較して、RADはほとんどのクローズドループ指標で優れた性能を発揮し、特に衝突率が3倍低くなります。豊富なクローズドループ結果はhttps://hgao-cv.github.io/RADに掲載されています。

English

Existing end-to-end autonomous driving (AD) algorithms typically follow the Imitation Learning (IL) paradigm, which faces challenges such as causal confusion and the open-loop gap. In this work, we establish a 3DGS-based closed-loop Reinforcement Learning (RL) training paradigm. By leveraging 3DGS techniques, we construct a photorealistic digital replica of the real physical world, enabling the AD policy to extensively explore the state space and learn to handle out-of-distribution scenarios through large-scale trial and error. To enhance safety, we design specialized rewards that guide the policy to effectively respond to safety-critical events and understand real-world causal relationships. For better alignment with human driving behavior, IL is incorporated into RL training as a regularization term. We introduce a closed-loop evaluation benchmark consisting of diverse, previously unseen 3DGS environments. Compared to IL-based methods, RAD achieves stronger performance in most closed-loop metrics, especially 3x lower collision rate. Abundant closed-loop results are presented at https://hgao-cv.github.io/RAD.

RAD: 大規模な3DGSベースの強化学習によるエンドツーエンド運転ポリシーのトレーニング

RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning

要旨

Support