ChatPaper.aiChatPaper

NuRisk:一個用於自動駕駛中代理級風險評估的視覺問答數據集

NuRisk: A Visual Question Answering Dataset for Agent-Level Risk Assessment in Autonomous Driving

September 30, 2025
作者: Yuan Gao, Mattia Piccinini, Roberto Brusnicki, Yuchen Zhang, Johannes Betz
cs.AI

摘要

理解自動駕駛中的風險不僅需要感知和預測,還需要對代理行為和情境進行高層次推理。當前基於視覺語言模型(VLMs)的方法主要將代理定位於靜態圖像中,並提供定性判斷,缺乏捕捉風險隨時間演變所需的時空推理能力。為解決這一問題,我們提出了NuRisk,這是一個全面的視覺問答(VQA)數據集,包含2,900個場景和110萬個代理級樣本,基於nuScenes和Waymo的真實數據構建,並輔以CommonRoad模擬器中的安全關鍵場景。該數據集提供了基於鳥瞰圖(BEV)的序列圖像,並帶有量化的代理級風險註釋,從而實現時空推理。我們在不同提示技術下對知名VLMs進行基準測試,發現它們無法執行顯式的時空推理,導致在高延遲下的峰值準確率僅為33%。為解決這些不足,我們微調的7B VLM代理將準確率提升至41%,並將延遲降低了75%,展示了專有模型所缺乏的顯式時空推理能力。儘管這代表著重大進步,但適中的準確率凸顯了該任務的深刻挑戰,使NuRisk成為推進自動駕駛中時空推理的關鍵基準。
English
Understanding risk in autonomous driving requires not only perception and prediction, but also high-level reasoning about agent behavior and context. Current Vision Language Models (VLMs)-based methods primarily ground agents in static images and provide qualitative judgments, lacking the spatio-temporal reasoning needed to capture how risks evolve over time. To address this gap, we propose NuRisk, a comprehensive Visual Question Answering (VQA) dataset comprising 2,900 scenarios and 1.1 million agent-level samples, built on real-world data from nuScenes and Waymo, supplemented with safety-critical scenarios from the CommonRoad simulator. The dataset provides Bird-Eye-View (BEV) based sequential images with quantitative, agent-level risk annotations, enabling spatio-temporal reasoning. We benchmark well-known VLMs across different prompting techniques and find that they fail to perform explicit spatio-temporal reasoning, resulting in a peak accuracy of 33% at high latency. To address these shortcomings, our fine-tuned 7B VLM agent improves accuracy to 41% and reduces latency by 75%, demonstrating explicit spatio-temporal reasoning capabilities that proprietary models lacked. While this represents a significant step forward, the modest accuracy underscores the profound challenge of the task, establishing NuRisk as a critical benchmark for advancing spatio-temporal reasoning in autonomous driving.
PDF02October 6, 2025