V2V-GoT:基于多模态大语言模型与思维图谱的车车协同自动驾驶
V2V-GoT: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multimodal Large Language Models and Graph-of-Thoughts
September 22, 2025
作者: Hsu-kuang Chiu, Ryo Hachiuma, Chien-Yi Wang, Yu-Chiang Frank Wang, Min-Hung Chen, Stephen F. Smith
cs.AI
摘要
当前最先进的自动驾驶车辆在道路上遇到大型邻近物体遮挡本地传感器时,可能会面临安全关键情境。车对车(V2V)协同自动驾驶被提出作为解决这一问题的手段,而近期引入的一个协同自动驾驶框架更进一步采纳了整合多模态大语言模型(MLLM)的方法,以融合协同感知与规划过程。然而,尽管将思维图推理应用于MLLM具有潜在优势,这一想法在以往的协同自动驾驶研究中尚未被考虑。本文中,我们提出了一种专为基于MLLM的协同自动驾驶设计的全新思维图框架。我们的思维图包含了我们提出的遮挡感知与规划感知预测的新颖理念。我们构建了V2V-GoT-QA数据集,并开发了V2V-GoT模型,用于训练和测试协同驾驶的思维图。实验结果表明,我们的方法在协同感知、预测及规划任务上均优于其他基线方法。
English
Current state-of-the-art autonomous vehicles could face safety-critical
situations when their local sensors are occluded by large nearby objects on the
road. Vehicle-to-vehicle (V2V) cooperative autonomous driving has been proposed
as a means of addressing this problem, and one recently introduced framework
for cooperative autonomous driving has further adopted an approach that
incorporates a Multimodal Large Language Model (MLLM) to integrate cooperative
perception and planning processes. However, despite the potential benefit of
applying graph-of-thoughts reasoning to the MLLM, this idea has not been
considered by previous cooperative autonomous driving research. In this paper,
we propose a novel graph-of-thoughts framework specifically designed for
MLLM-based cooperative autonomous driving. Our graph-of-thoughts includes our
proposed novel ideas of occlusion-aware perception and planning-aware
prediction. We curate the V2V-GoT-QA dataset and develop the V2V-GoT model for
training and testing the cooperative driving graph-of-thoughts. Our
experimental results show that our method outperforms other baselines in
cooperative perception, prediction, and planning tasks.