V2V-GoT：マルチモーダル大規模言語モデルと思考グラフを用いた車両間協調型自動運転

要旨

現在の最先端の自動運転車両は、道路上の近隣の大型物体によってローカルセンサーが遮蔽される状況において、安全性が脅かされる可能性がある。この問題に対処する手段として、車両間（V2V）協調型自動運転が提案されており、最近導入された協調型自動運転のフレームワークでは、マルチモーダル大規模言語モデル（MLLM）を統合して協調的知覚と計画プロセスを組み込むアプローチが採用されている。しかし、MLLMに思考グラフ（graph-of-thoughts）推論を適用する潜在的な利点にもかかわらず、このアイデアはこれまでの協調型自動運転研究では考慮されていない。本論文では、MLLMベースの協調型自動運転に特化した新しい思考グラフフレームワークを提案する。我々の思考グラフは、遮蔽を考慮した知覚と計画を意識した予測という新たなアイデアを含んでいる。また、協調運転の思考グラフを訓練およびテストするために、V2V-GoT-QAデータセットを構築し、V2V-GoTモデルを開発した。実験結果は、我々の手法が協調的知覚、予測、および計画タスクにおいて他のベースラインを上回ることを示している。

English

Current state-of-the-art autonomous vehicles could face safety-critical situations when their local sensors are occluded by large nearby objects on the road. Vehicle-to-vehicle (V2V) cooperative autonomous driving has been proposed as a means of addressing this problem, and one recently introduced framework for cooperative autonomous driving has further adopted an approach that incorporates a Multimodal Large Language Model (MLLM) to integrate cooperative perception and planning processes. However, despite the potential benefit of applying graph-of-thoughts reasoning to the MLLM, this idea has not been considered by previous cooperative autonomous driving research. In this paper, we propose a novel graph-of-thoughts framework specifically designed for MLLM-based cooperative autonomous driving. Our graph-of-thoughts includes our proposed novel ideas of occlusion-aware perception and planning-aware prediction. We curate the V2V-GoT-QA dataset and develop the V2V-GoT model for training and testing the cooperative driving graph-of-thoughts. Our experimental results show that our method outperforms other baselines in cooperative perception, prediction, and planning tasks.

V2V-GoT：マルチモーダル大規模言語モデルと思考グラフを用いた車両間協調型自動運転

V2V-GoT: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multimodal Large Language Models and Graph-of-Thoughts

要旨

Support