OpenREAD:基于LLM批判性评估的端到端自动驾驶强化开放推理框架
OpenREAD: Reinforced Open-Ended Reasoing for End-to-End Autonomous Driving with LLM-as-Critic
December 1, 2025
作者: Songyan Zhang, Wenhui Huang, Zhan Chen, Chua Jiahao Collister, Qihang Huang, Chen Lv
cs.AI
摘要
近期,两阶段微调策略(如通过监督微调获取核心驾驶知识,再经由强化微调提升决策规划能力)在推动知识驱动型自动驾驶范式发展中展现出巨大潜力。然而,监督微调的学习机制仍存在推理泛化能力局限,制约了驾驶性能的全面提升。同时,由于场景理解属于开放性问题且对应奖励难以量化,现有强化微调方法主要应用于下游任务。为突破这些限制,我们提出OpenREAD——一种基于开放域推理增强的视觉语言模型自动驾驶框架,可实现从高层推理到底层轨迹规划的端到端强化微调。具体而言,我们首先在开源驾驶知识数据集上构建大规模思维链标注,并采用强大的Qwen3大语言模型作为强化微调中的评判器,用于开放性问题推理质量的奖励建模量化。大量实验证实,联合端到端强化微调能显著提升上下游任务性能,使OpenREAD在推理与规划基准测试中达到最先进水平。
English
Recently, two-stage fine-tuning strategies, e.g., acquiring essential driving knowledge through supervised fine-tuning (SFT) and further enhancing decision-making and planning via reinforcement fine-tuning (RFT), have shown strong potential in advancing the knowledge-driven autonomous driving (AD) paradigm. However, the learning nature of SFT still limits the generalization of reasoning, thereby constraining the full potential of driving performance. Meanwhile, current RFT approaches are primarily applied to downstream tasks, since scene understanding is an open-ended problem where corresponding rewards are difficult to quantify. To address these limitations, we propose OpenREAD, an OPEN-ended REasoning reinforced vision-language model (VLM)-based autonomous driving (AD) framework that enables end-to-end RFT across the full spectrum from high-level reasoning to low-level trajectory planning. Specifically, we begin by constructing large-scale Chain-of-Thought (CoT) annotations on open-source driving-related knowledge datasets, and employ the powerful Qwen3 large language model (LLM) as the critic in RFT to quantify reasoning quality for open-ended questions during reward modeling. Extensive experiments confirm that joint end-to-end RFT yields substantial improvements in both upstream and downstream tasks, enabling OpenREAD to achieve state-of-the-art performance on reasoning and planning benchmarks.