RAD:基於大規模3D高斯散射強化學習的端到端駕駛策略訓練
RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning
February 18, 2025
作者: Hao Gao, Shaoyu Chen, Bo Jiang, Bencheng Liao, Yiang Shi, Xiaoyang Guo, Yuechuan Pu, Haoran Yin, Xiangyu Li, Xinbang Zhang, Ying Zhang, Wenyu Liu, Qian Zhang, Xinggang Wang
cs.AI
摘要
現有的端到端自動駕駛(AD)算法通常遵循模仿學習(IL)範式,該範式面臨因果混淆和開環差距等挑戰。在本研究中,我們建立了一種基於3D高斯散射(3DGS)的閉環強化學習(RL)訓練範式。通過利用3DGS技術,我們構建了一個逼真的數字化現實物理世界複製品,使AD策略能夠廣泛探索狀態空間,並通過大規模試錯學習處理分佈外場景。為了增強安全性,我們設計了專門的獎勵機制,引導策略有效應對安全關鍵事件並理解現實世界的因果關係。為了更好地與人類駕駛行為對齊,我們將IL作為正則化項融入RL訓練中。我們引入了一個由多樣化、未見過的3DGS環境組成的閉環評估基準。與基於IL的方法相比,RAD在大多數閉環指標上表現更強,尤其是碰撞率降低了3倍。豐富的閉環結果展示於https://hgao-cv.github.io/RAD。
English
Existing end-to-end autonomous driving (AD) algorithms typically follow the
Imitation Learning (IL) paradigm, which faces challenges such as causal
confusion and the open-loop gap. In this work, we establish a 3DGS-based
closed-loop Reinforcement Learning (RL) training paradigm. By leveraging 3DGS
techniques, we construct a photorealistic digital replica of the real physical
world, enabling the AD policy to extensively explore the state space and learn
to handle out-of-distribution scenarios through large-scale trial and error. To
enhance safety, we design specialized rewards that guide the policy to
effectively respond to safety-critical events and understand real-world causal
relationships. For better alignment with human driving behavior, IL is
incorporated into RL training as a regularization term. We introduce a
closed-loop evaluation benchmark consisting of diverse, previously unseen 3DGS
environments. Compared to IL-based methods, RAD achieves stronger performance
in most closed-loop metrics, especially 3x lower collision rate. Abundant
closed-loop results are presented at https://hgao-cv.github.io/RAD.