V2V-GoT：基于多模态大语言模型与思维图谱的车车协同自动驾驶

摘要

当前最先进的自动驾驶车辆在道路上遇到大型邻近物体遮挡本地传感器时，可能会面临安全关键情境。车对车（V2V）协同自动驾驶被提出作为解决这一问题的手段，而近期引入的一个协同自动驾驶框架更进一步采纳了整合多模态大语言模型（MLLM）的方法，以融合协同感知与规划过程。然而，尽管将思维图推理应用于MLLM具有潜在优势，这一想法在以往的协同自动驾驶研究中尚未被考虑。本文中，我们提出了一种专为基于MLLM的协同自动驾驶设计的全新思维图框架。我们的思维图包含了我们提出的遮挡感知与规划感知预测的新颖理念。我们构建了V2V-GoT-QA数据集，并开发了V2V-GoT模型，用于训练和测试协同驾驶的思维图。实验结果表明，我们的方法在协同感知、预测及规划任务上均优于其他基线方法。

English

Current state-of-the-art autonomous vehicles could face safety-critical situations when their local sensors are occluded by large nearby objects on the road. Vehicle-to-vehicle (V2V) cooperative autonomous driving has been proposed as a means of addressing this problem, and one recently introduced framework for cooperative autonomous driving has further adopted an approach that incorporates a Multimodal Large Language Model (MLLM) to integrate cooperative perception and planning processes. However, despite the potential benefit of applying graph-of-thoughts reasoning to the MLLM, this idea has not been considered by previous cooperative autonomous driving research. In this paper, we propose a novel graph-of-thoughts framework specifically designed for MLLM-based cooperative autonomous driving. Our graph-of-thoughts includes our proposed novel ideas of occlusion-aware perception and planning-aware prediction. We curate the V2V-GoT-QA dataset and develop the V2V-GoT model for training and testing the cooperative driving graph-of-thoughts. Our experimental results show that our method outperforms other baselines in cooperative perception, prediction, and planning tasks.

V2V-GoT：基于多模态大语言模型与思维图谱的车车协同自动驾驶

V2V-GoT: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multimodal Large Language Models and Graph-of-Thoughts

摘要

Support