V2V-GoT：基於多模態大型語言模型與思維圖譜的車對車協同自主駕駛

摘要

當前最先進的自動駕駛車輛在道路上的大型鄰近物體遮擋其局部傳感器時，可能會面臨安全關鍵情境。為解決此問題，提出了車對車（V2V）協同自動駕駛的概念，而近期引入的一種協同自動駕駛框架更進一步採用了整合多模態大型語言模型（MLLM）的方法，以融合協同感知與規劃過程。然而，儘管將思維圖推理應用於MLLM具有潛在優勢，這一想法尚未被先前的協同自動駕駛研究所考慮。本文中，我們提出了一種專為基於MLLM的協同自動駕駛設計的新穎思維圖框架。我們的思維圖包含了我們提出的遮擋感知感知與規劃感知預測的新穎理念。我們精心策劃了V2V-GoT-QA數據集，並開發了V2V-GoT模型，用於訓練和測試協同駕駛的思維圖。實驗結果表明，我們的方法在協同感知、預測及規劃任務上均優於其他基準方法。

English

Current state-of-the-art autonomous vehicles could face safety-critical situations when their local sensors are occluded by large nearby objects on the road. Vehicle-to-vehicle (V2V) cooperative autonomous driving has been proposed as a means of addressing this problem, and one recently introduced framework for cooperative autonomous driving has further adopted an approach that incorporates a Multimodal Large Language Model (MLLM) to integrate cooperative perception and planning processes. However, despite the potential benefit of applying graph-of-thoughts reasoning to the MLLM, this idea has not been considered by previous cooperative autonomous driving research. In this paper, we propose a novel graph-of-thoughts framework specifically designed for MLLM-based cooperative autonomous driving. Our graph-of-thoughts includes our proposed novel ideas of occlusion-aware perception and planning-aware prediction. We curate the V2V-GoT-QA dataset and develop the V2V-GoT model for training and testing the cooperative driving graph-of-thoughts. Our experimental results show that our method outperforms other baselines in cooperative perception, prediction, and planning tasks.

V2V-GoT：基於多模態大型語言模型與思維圖譜的車對車協同自主駕駛

V2V-GoT: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multimodal Large Language Models and Graph-of-Thoughts

摘要

Support