CoIRL-AD:潛在世界模型中的協作-競爭模仿-強化學習於自動駕駛應用
CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving
October 14, 2025
作者: Xiaoji Zheng, Ziyuan Yang, Yanhao Chen, Yuhang Peng, Yuanrong Tang, Gengyuan Liu, Bokui Chen, Jiangtao Gong
cs.AI
摘要
僅通過模仿學習(IL)訓練的端到端自動駕駛模型往往存在泛化能力不足的問題。相比之下,強化學習(RL)通過獎勵最大化促進探索,但面臨樣本效率低下和收斂不穩定等挑戰。一個自然的解決方案是將IL和RL結合起來。我們超越了傳統的兩階段範式(先進行IL預訓練,再進行RL微調),提出了CoIRL-AD,這是一個競爭性的雙策略框架,使IL和RL代理在訓練過程中能夠互動。CoIRL-AD引入了一種基於競爭的機制,促進知識交換的同時避免梯度衝突。在nuScenes數據集上的實驗顯示,與基線相比,碰撞率降低了18%,並且在長尾場景中表現出更強的泛化能力和改進的性能。代碼可在以下網址獲取:https://github.com/SEU-zxj/CoIRL-AD。
English
End-to-end autonomous driving models trained solely with imitation learning
(IL) often suffer from poor generalization. In contrast, reinforcement learning
(RL) promotes exploration through reward maximization but faces challenges such
as sample inefficiency and unstable convergence. A natural solution is to
combine IL and RL. Moving beyond the conventional two-stage paradigm (IL
pretraining followed by RL fine-tuning), we propose CoIRL-AD, a competitive
dual-policy framework that enables IL and RL agents to interact during
training. CoIRL-AD introduces a competition-based mechanism that facilitates
knowledge exchange while preventing gradient conflicts. Experiments on the
nuScenes dataset show an 18% reduction in collision rate compared to baselines,
along with stronger generalization and improved performance on long-tail
scenarios. Code is available at: https://github.com/SEU-zxj/CoIRL-AD.