带有推理轨迹的长尾驾驶场景：KITScenes长尾数据集

摘要

在自动驾驶等现实领域，对罕见场景的泛化能力仍是根本性挑战。为此，我们推出了专为端到端驾驶设计的新型数据集，重点关注长尾驾驶事件。我们提供多视角视频数据、轨迹信息、高级指令及详细推理轨迹，支持上下文学习与少样本泛化。这一面向多模态模型（如VLM和VLA）的基准测试不仅评估安全性与舒适度指标，更着重考察指令遵循能力及模型输出的语义连贯性。包含英语、西班牙语和中文的多语言推理轨迹来自具有多元文化背景的领域专家，使我们的数据集成为研究不同推理形式如何影响驾驶能力的独特资源。数据集地址：https://hf.co/datasets/kit-mrt/kitscenes-longtail

English

In real-world domains such as self-driving, generalization to rare scenarios remains a fundamental challenge. To address this, we introduce a new dataset designed for end-to-end driving that focuses on long-tail driving events. We provide multi-view video data, trajectories, high-level instructions, and detailed reasoning traces, facilitating in-context learning and few-shot generalization. The resulting benchmark for multimodal models, such as VLMs and VLAs, goes beyond safety and comfort metrics by evaluating instruction following and semantic coherence between model outputs. The multilingual reasoning traces in English, Spanish, and Chinese are from domain experts with diverse cultural backgrounds. Thus, our dataset is a unique resource for studying how different forms of reasoning affect driving competence. Our dataset is available at: https://hf.co/datasets/kit-mrt/kitscenes-longtail