基於基礎模型增強的自動駕駛強化學習 (注:Found-RL是專有名詞組合,採用"基礎模型增強"的意譯方式,既保留Foundation Model的核心概念,又體現其對強化學習的增強作用。autonomous driving採用業界通用譯法"自動駕駛",RL保持"強化學習"的標準學術譯名。)
Found-RL: foundation model-enhanced reinforcement learning for autonomous driving
February 11, 2026
作者: Yansong Qu, Zihao Sheng, Zilin Huang, Jiancong Chen, Yuhao Luo, Tianyi Wang, Yiheng Feng, Samuel Labi, Sikai Chen
cs.AI
摘要
強化學習(RL)已成為端到端自動駕駛(AD)的主流範式,但其在複雜場景中存在樣本效率低且缺乏語義可解釋性的問題。基礎模型(特別是視覺語言模型VLM)可透過提供豐富的情境感知知識來緩解這些問題,然而其高推理延遲阻礙了在高頻率RL訓練迴圈中的部署。為此,我們提出Found-RL——一個專為利用基礎模型高效增強自動駕駛強化學習而設計的平台。其核心創新在於非同步批次推理框架,該框架將繁重的VLM推理與模擬迴圈解耦,有效解決延遲瓶頸以支援即時學習。我們引入多樣化監督機制:價值邊際正則化(VMR)與優勢加權動作引導(AWAG),以將類專家級的VLM動作建議有效提煉至RL策略中。此外,採用高吞吐量的CLIP模型進行密集獎勵塑形,並透過條件對比動作對齊解決CLIP的動態盲區問題——該方法基於離散化的速度/指令條件化提示,透過情境化動作錨點評分產生標準化的邊際獎勵。Found-RL提供端到端的微調VLM整合流程,實驗表明輕量級RL模型可實現接近數十億參數VLM的性能,同時維持即時推理(約500 FPS)。程式碼、資料與模型將公開於:https://github.com/ys-qu/found-rl。
English
Reinforcement Learning (RL) has emerged as a dominant paradigm for end-to-end autonomous driving (AD). However, RL suffers from sample inefficiency and a lack of semantic interpretability in complex scenarios. Foundation Models, particularly Vision-Language Models (VLMs), can mitigate this by offering rich, context-aware knowledge, yet their high inference latency hinders deployment in high-frequency RL training loops. To bridge this gap, we present Found-RL, a platform tailored to efficiently enhance RL for AD using foundation models. A core innovation is the asynchronous batch inference framework, which decouples heavy VLM reasoning from the simulation loop, effectively resolving latency bottlenecks to support real-time learning. We introduce diverse supervision mechanisms: Value-Margin Regularization (VMR) and Advantage-Weighted Action Guidance (AWAG) to effectively distill expert-like VLM action suggestions into the RL policy. Additionally, we adopt high-throughput CLIP for dense reward shaping. We address CLIP's dynamic blindness via Conditional Contrastive Action Alignment, which conditions prompts on discretized speed/command and yields a normalized, margin-based bonus from context-specific action-anchor scoring. Found-RL provides an end-to-end pipeline for fine-tuned VLM integration and shows that a lightweight RL model can achieve near-VLM performance compared with billion-parameter VLMs while sustaining real-time inference (approx. 500 FPS). Code, data, and models will be publicly available at https://github.com/ys-qu/found-rl.