ChatPaper.aiChatPaper

NuRisk:面向自动驾驶中个体风险评估的视觉问答数据集

NuRisk: A Visual Question Answering Dataset for Agent-Level Risk Assessment in Autonomous Driving

September 30, 2025
作者: Yuan Gao, Mattia Piccinini, Roberto Brusnicki, Yuchen Zhang, Johannes Betz
cs.AI

摘要

理解自动驾驶中的风险不仅需要感知与预测,还需对智能体行为及环境进行高层推理。当前基于视觉语言模型(VLMs)的方法主要将智能体定位在静态图像中,提供定性判断,缺乏捕捉风险随时间演变所需的时空推理能力。为填补这一空白,我们提出了NuRisk,一个全面的视觉问答(VQA)数据集,包含2,900个场景和110万个智能体级别样本,基于nuScenes和Waymo的真实世界数据构建,并辅以CommonRoad模拟器中的安全关键场景。该数据集提供基于鸟瞰图(BEV)的序列图像,带有定量、智能体级别的风险标注,支持时空推理。我们评估了多种提示技术下的知名VLMs,发现它们无法执行显式的时空推理,导致在高延迟下最高准确率仅为33%。针对这些不足,我们微调的7B VLM智能体将准确率提升至41%,并将延迟降低75%,展现了专有模型所不具备的显式时空推理能力。尽管这标志着显著进步,但相对较低的准确率凸显了该任务的巨大挑战,确立了NuRisk作为推动自动驾驶时空推理发展的关键基准地位。
English
Understanding risk in autonomous driving requires not only perception and prediction, but also high-level reasoning about agent behavior and context. Current Vision Language Models (VLMs)-based methods primarily ground agents in static images and provide qualitative judgments, lacking the spatio-temporal reasoning needed to capture how risks evolve over time. To address this gap, we propose NuRisk, a comprehensive Visual Question Answering (VQA) dataset comprising 2,900 scenarios and 1.1 million agent-level samples, built on real-world data from nuScenes and Waymo, supplemented with safety-critical scenarios from the CommonRoad simulator. The dataset provides Bird-Eye-View (BEV) based sequential images with quantitative, agent-level risk annotations, enabling spatio-temporal reasoning. We benchmark well-known VLMs across different prompting techniques and find that they fail to perform explicit spatio-temporal reasoning, resulting in a peak accuracy of 33% at high latency. To address these shortcomings, our fine-tuned 7B VLM agent improves accuracy to 41% and reduces latency by 75%, demonstrating explicit spatio-temporal reasoning capabilities that proprietary models lacked. While this represents a significant step forward, the modest accuracy underscores the profound challenge of the task, establishing NuRisk as a critical benchmark for advancing spatio-temporal reasoning in autonomous driving.
PDF02October 6, 2025