面向泛化端到端自动驾驶的风险感知世界模型预测控制
Risk-Aware World Model Predictive Control for Generalizable End-to-End Autonomous Driving
February 26, 2026
作者: Jiangxin Sun, Feng Xue, Teng Long, Chang Liu, Jian-Fang Hu, Wei-Shi Zheng, Nicu Sebe
cs.AI
摘要
随着模仿学习(IL)与大规模驾驶数据集的发展,端到端自动驾驶(E2E-AD)近期取得显著进展。当前基于IL的方法已成为主流范式:模型依赖专家提供的标准驾驶行为,通过最小化自身动作与专家动作的差异进行学习。然而这种"仅模仿专家驾驶"的目标存在泛化局限性:当遇到专家示范分布之外的罕见或未见长尾场景时,由于缺乏先验经验,模型易产生不安全决策。这引出一个根本性问题:端到端自动驾驶系统能否在无专家动作监督的情况下做出可靠决策?基于此,我们提出统一框架——风险感知世界模型预测控制(RaWMPC),通过鲁棒控制解决泛化困境,且无需依赖专家示范。具体而言,RaWMPC利用世界模型预测多组候选动作的后果,并通过显式风险评估选择低风险动作。为使世界模型具备预测危险驾驶行为后果的能力,我们设计了风险感知交互策略,系统性地让世界模型接触危险行为,使灾难性后果可预测从而可规避。此外,为在测试时生成低风险候选动作,我们提出自评估蒸馏法,将训练完备的世界模型中的风险规避能力蒸馏至生成式动作提议网络,全程无需专家示范。大量实验表明,RaWMPC在分布内与分布外场景中均优于现有先进方法,同时提供更优的决策可解释性。
English
With advances in imitation learning (IL) and large-scale driving datasets, end-to-end autonomous driving (E2E-AD) has made great progress recently. Currently, IL-based methods have become a mainstream paradigm: models rely on standard driving behaviors given by experts, and learn to minimize the discrepancy between their actions and expert actions. However, this objective of "only driving like the expert" suffers from limited generalization: when encountering rare or unseen long-tail scenarios outside the distribution of expert demonstrations, models tend to produce unsafe decisions in the absence of prior experience. This raises a fundamental question: Can an E2E-AD system make reliable decisions without any expert action supervision? Motivated by this, we propose a unified framework named Risk-aware World Model Predictive Control (RaWMPC) to address this generalization dilemma through robust control, without reliance on expert demonstrations. Practically, RaWMPC leverages a world model to predict the consequences of multiple candidate actions and selects low-risk actions through explicit risk evaluation. To endow the world model with the ability to predict the outcomes of risky driving behaviors, we design a risk-aware interaction strategy that systematically exposes the world model to hazardous behaviors, making catastrophic outcomes predictable and thus avoidable. Furthermore, to generate low-risk candidate actions at test time, we introduce a self-evaluation distillation method to distill riskavoidance capabilities from the well-trained world model into a generative action proposal network without any expert demonstration. Extensive experiments show that RaWMPC outperforms state-of-the-art methods in both in-distribution and out-of-distribution scenarios, while providing superior decision interpretability.