RAD：基於大規模3D高斯散射強化學習的端到端駕駛策略訓練

摘要

現有的端到端自動駕駛（AD）算法通常遵循模仿學習（IL）範式，該範式面臨因果混淆和開環差距等挑戰。在本研究中，我們建立了一種基於3D高斯散射（3DGS）的閉環強化學習（RL）訓練範式。通過利用3DGS技術，我們構建了一個逼真的數字化現實物理世界複製品，使AD策略能夠廣泛探索狀態空間，並通過大規模試錯學習處理分佈外場景。為了增強安全性，我們設計了專門的獎勵機制，引導策略有效應對安全關鍵事件並理解現實世界的因果關係。為了更好地與人類駕駛行為對齊，我們將IL作為正則化項融入RL訓練中。我們引入了一個由多樣化、未見過的3DGS環境組成的閉環評估基準。與基於IL的方法相比，RAD在大多數閉環指標上表現更強，尤其是碰撞率降低了3倍。豐富的閉環結果展示於https://hgao-cv.github.io/RAD。

English

Existing end-to-end autonomous driving (AD) algorithms typically follow the Imitation Learning (IL) paradigm, which faces challenges such as causal confusion and the open-loop gap. In this work, we establish a 3DGS-based closed-loop Reinforcement Learning (RL) training paradigm. By leveraging 3DGS techniques, we construct a photorealistic digital replica of the real physical world, enabling the AD policy to extensively explore the state space and learn to handle out-of-distribution scenarios through large-scale trial and error. To enhance safety, we design specialized rewards that guide the policy to effectively respond to safety-critical events and understand real-world causal relationships. For better alignment with human driving behavior, IL is incorporated into RL training as a regularization term. We introduce a closed-loop evaluation benchmark consisting of diverse, previously unseen 3DGS environments. Compared to IL-based methods, RAD achieves stronger performance in most closed-loop metrics, especially 3x lower collision rate. Abundant closed-loop results are presented at https://hgao-cv.github.io/RAD.

RAD：基於大規模3D高斯散射強化學習的端到端駕駛策略訓練

RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning

摘要

Support