帶有推理軌跡的長尾駕駛場景：KITScenes長尾數據集

摘要

在自動駕駛等現實領域中，對罕見場景的泛化能力仍是核心挑戰。為此，我們推出專注於長尾駕駛事件的新型端到端駕駛資料集，提供多視角影片數據、軌跡資料、高層級指令及詳細推理脈絡，以支援情境學習與少樣本泛化。該基準測試針對視覺語言模型（VLM）與視覺語言動作模型（VLA）等多模態模型，不僅評估安全性與舒適度指標，更著重於指令遵循能力及模型輸出的語義連貫性。資料集包含由多元文化背景領域專家提供的英語、西班牙語及中文多語言推理脈絡，使其成為研究不同推理形式如何影響駕駛能力的獨特資源。資料集已公開於：https://hf.co/datasets/kit-mrt/kitscenes-longtail

English

In real-world domains such as self-driving, generalization to rare scenarios remains a fundamental challenge. To address this, we introduce a new dataset designed for end-to-end driving that focuses on long-tail driving events. We provide multi-view video data, trajectories, high-level instructions, and detailed reasoning traces, facilitating in-context learning and few-shot generalization. The resulting benchmark for multimodal models, such as VLMs and VLAs, goes beyond safety and comfort metrics by evaluating instruction following and semantic coherence between model outputs. The multilingual reasoning traces in English, Spanish, and Chinese are from domain experts with diverse cultural backgrounds. Thus, our dataset is a unique resource for studying how different forms of reasoning affect driving competence. Our dataset is available at: https://hf.co/datasets/kit-mrt/kitscenes-longtail

帶有推理軌跡的長尾駕駛場景：KITScenes長尾數據集

LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset

摘要

Support