一般化可能なエンドツーエンド自律走行のためのリスク考慮型世界モデル予測制御

要旨

模倣学習（IL）と大規模運転データセットの進歩により、エンドツーエンド自動運転（E2E-AD）は近年大きな進展を遂げている。現在、ILベースの手法は主流のパラダイムとなっており、モデルは専門家による標準的な運転行動に依存し、自身の行動と専門家の行動の差異を最小化するように学習する。しかし、「専門家のように運転するのみ」というこの目的は、一般化能力が限定的であるという課題を抱えている。専門家の実証データの分布外にある稀なまたは未経験のロングテールシナリオに遭遇した場合、モデルは事前経験の欠如により不安全な判断を下す傾向がある。これは根本的な疑問を提起する：専門家の行動監督なしで、E2E-ADシステムは信頼性の高い判断を下すことができるだろうか？この問題意識に動機付けられ、我々はロバスト制御を通じてこの一般化のジレンマに対処する統一フレームワーク「Risk-aware World Model Predictive Control（RaWMPC）」を提案する。本手法は、専門家の実証データに依存しない。具体的には、RaWMPCは世界モデルを利用して複数の候補行動の結果を予測し、明示的なリスク評価を通じて低リスクな行動を選択する。世界モデルに危険な運転行動の結果を予測する能力を付与するため、世界モデルを体系的に危険な行動に曝露するリスク認識相互作用戦略を設計し、致命的な結果を予測可能（ひいては回避可能）にする。さらに、テスト時に低リスクな候補行動を生成するため、十分に学習された世界モデルからリスク回避能力を生成的行動提案ネットワークに蒸留する自己評価蒸留法を導入する。大規模な実験により、RaWMPCが分布内及び分布外の両シナリオにおいて既存の最先端手法を凌駕し、優れた判断の解釈可能性を提供することを示す。

English

With advances in imitation learning (IL) and large-scale driving datasets, end-to-end autonomous driving (E2E-AD) has made great progress recently. Currently, IL-based methods have become a mainstream paradigm: models rely on standard driving behaviors given by experts, and learn to minimize the discrepancy between their actions and expert actions. However, this objective of "only driving like the expert" suffers from limited generalization: when encountering rare or unseen long-tail scenarios outside the distribution of expert demonstrations, models tend to produce unsafe decisions in the absence of prior experience. This raises a fundamental question: Can an E2E-AD system make reliable decisions without any expert action supervision? Motivated by this, we propose a unified framework named Risk-aware World Model Predictive Control (RaWMPC) to address this generalization dilemma through robust control, without reliance on expert demonstrations. Practically, RaWMPC leverages a world model to predict the consequences of multiple candidate actions and selects low-risk actions through explicit risk evaluation. To endow the world model with the ability to predict the outcomes of risky driving behaviors, we design a risk-aware interaction strategy that systematically exposes the world model to hazardous behaviors, making catastrophic outcomes predictable and thus avoidable. Furthermore, to generate low-risk candidate actions at test time, we introduce a self-evaluation distillation method to distill riskavoidance capabilities from the well-trained world model into a generative action proposal network without any expert demonstration. Extensive experiments show that RaWMPC outperforms state-of-the-art methods in both in-distribution and out-of-distribution scenarios, while providing superior decision interpretability.

一般化可能なエンドツーエンド自律走行のためのリスク考慮型世界モデル予測制御

Risk-Aware World Model Predictive Control for Generalizable End-to-End Autonomous Driving

要旨

Support