NuRisk: 自動運転におけるエージェントレベルのリスク評価のための視覚的質問応答データセット

要旨

自動運転におけるリスクの理解には、知覚や予測だけでなく、エージェントの行動と文脈に関する高レベルの推論が不可欠である。現在のVision Language Models（VLMs）ベースの手法は、主にエージェントを静止画像に基づいて定着させ、定性的な判断を提供するが、リスクが時間とともにどのように変化するかを捉えるための時空間的推論が欠如している。このギャップを埋めるため、我々はNuRiskを提案する。これは、nuScenesとWaymoの実世界データに基づき、CommonRoadシミュレーターからの安全クリティカルなシナリオを補完した、2,900のシナリオと110万のエージェントレベルサンプルからなる包括的なVisual Question Answering（VQA）データセットである。このデータセットは、定量的なエージェントレベルのリスク注釈付きのBird-Eye-View（BEV）ベースの連続画像を提供し、時空間的推論を可能にする。我々は、さまざまなプロンプト技術を用いて既知のVLMsをベンチマークし、それらが明示的な時空間的推論を実行できないため、高レイテンシで33%のピーク精度しか達成できないことを発見した。これらの欠点を解決するため、我々がファインチューニングした7B VLMエージェントは精度を41%に向上させ、レイテンシを75%削減し、プロプライエタリモデルが欠いていた明示的な時空間的推論能力を示した。これは大きな前進であるが、控えめな精度はこのタスクの深刻な課題を浮き彫りにしており、NuRiskを自動運転における時空間的推論の進展のための重要なベンチマークとして確立するものである。

English

Understanding risk in autonomous driving requires not only perception and prediction, but also high-level reasoning about agent behavior and context. Current Vision Language Models (VLMs)-based methods primarily ground agents in static images and provide qualitative judgments, lacking the spatio-temporal reasoning needed to capture how risks evolve over time. To address this gap, we propose NuRisk, a comprehensive Visual Question Answering (VQA) dataset comprising 2,900 scenarios and 1.1 million agent-level samples, built on real-world data from nuScenes and Waymo, supplemented with safety-critical scenarios from the CommonRoad simulator. The dataset provides Bird-Eye-View (BEV) based sequential images with quantitative, agent-level risk annotations, enabling spatio-temporal reasoning. We benchmark well-known VLMs across different prompting techniques and find that they fail to perform explicit spatio-temporal reasoning, resulting in a peak accuracy of 33% at high latency. To address these shortcomings, our fine-tuned 7B VLM agent improves accuracy to 41% and reduces latency by 75%, demonstrating explicit spatio-temporal reasoning capabilities that proprietary models lacked. While this represents a significant step forward, the modest accuracy underscores the profound challenge of the task, establishing NuRisk as a critical benchmark for advancing spatio-temporal reasoning in autonomous driving.

NuRisk: 自動運転におけるエージェントレベルのリスク評価のための視覚的質問応答データセット

NuRisk: A Visual Question Answering Dataset for Agent-Level Risk Assessment in Autonomous Driving

要旨

Support