Found-RL:基于基础模型增强的自动驾驶强化学习
Found-RL: foundation model-enhanced reinforcement learning for autonomous driving
February 11, 2026
作者: Yansong Qu, Zihao Sheng, Zilin Huang, Jiancong Chen, Yuhao Luo, Tianyi Wang, Yiheng Feng, Samuel Labi, Sikai Chen
cs.AI
摘要
强化学习(RL)已成为端到端自动驾驶(AD)领域的主流范式。然而,在复杂场景中,RL存在样本效率低和语义可解释性不足的问题。基础模型(特别是视觉语言模型VLM)可通过提供丰富的上下文感知知识缓解这一问题,但其高推理延迟阻碍了在高频RL训练循环中的部署。为弥合这一差距,我们推出Found-RL——专为利用基础模型高效增强自动驾驶RL而设计的平台。其核心创新是异步批量推理框架,该框架将繁重的VLM推理与仿真循环解耦,有效解决延迟瓶颈以支持实时学习。我们引入多种监督机制:价值边际正则化(VMR)和优势加权动作引导(AWAG),将类专家的VLM动作建议有效蒸馏至RL策略中。此外,我们采用高吞吐量CLIP模型进行稠密奖励塑造,并通过条件对比动作对齐解决CLIP的动态盲区问题——该方法基于离散化的速度/指令条件生成提示,通过特定上下文下的动作锚点评分产生归一化的边际奖励。Found-RL提供端到端的微调VLM集成流程,实验表明轻量化RL模型可实现接近百亿参数VLM的性能,同时保持实时推理能力(约500 FPS)。代码、数据及模型将公开于https://github.com/ys-qu/found-rl。
English
Reinforcement Learning (RL) has emerged as a dominant paradigm for end-to-end autonomous driving (AD). However, RL suffers from sample inefficiency and a lack of semantic interpretability in complex scenarios. Foundation Models, particularly Vision-Language Models (VLMs), can mitigate this by offering rich, context-aware knowledge, yet their high inference latency hinders deployment in high-frequency RL training loops. To bridge this gap, we present Found-RL, a platform tailored to efficiently enhance RL for AD using foundation models. A core innovation is the asynchronous batch inference framework, which decouples heavy VLM reasoning from the simulation loop, effectively resolving latency bottlenecks to support real-time learning. We introduce diverse supervision mechanisms: Value-Margin Regularization (VMR) and Advantage-Weighted Action Guidance (AWAG) to effectively distill expert-like VLM action suggestions into the RL policy. Additionally, we adopt high-throughput CLIP for dense reward shaping. We address CLIP's dynamic blindness via Conditional Contrastive Action Alignment, which conditions prompts on discretized speed/command and yields a normalized, margin-based bonus from context-specific action-anchor scoring. Found-RL provides an end-to-end pipeline for fine-tuned VLM integration and shows that a lightweight RL model can achieve near-VLM performance compared with billion-parameter VLMs while sustaining real-time inference (approx. 500 FPS). Code, data, and models will be publicly available at https://github.com/ys-qu/found-rl.