互動、指導以提升：基於大型語言模型的平行行動者-推理者框架，用於增強自動駕駛車輛的交互能力

摘要

自動駕駛車輛（AVs）已進入商業化階段，但其在與人類駕駛車輛（HVs）互動及表達意圖方面的能力仍顯不足，這在實際交互中帶來了挑戰。近期大型語言模型（LLMs）的進展實現了雙向人機溝通，然而，推理速度慢與實時決策需求之間的矛盾，對實際部署構成了挑戰。為解決這些問題，本文提出了一種並行的執行者-推理者框架，旨在實現多場景下AV與HV之間明確的雙向互動。首先，通過在訓練過程中促進LLM驅動的推理者與異構模擬HVs的互動，建立了一個被稱為執行者的互動記憶數據庫。隨後，通過引入記憶分區模塊和雙層記憶檢索模塊，顯著增強了執行者處理異構HVs的能力。消融研究及與其他決策方法的比較表明，所提出的執行者-推理者框架顯著提升了安全性和效率。最後，結合推理者推理得出的外部人機界面（eHMI）信息與從執行者檢索到的可行行動方案，在多場景實地互動中驗證了所提執行者-推理者框架的有效性。我們的代碼可在https://github.com/FanGShiYuu/Actor-Reasoner獲取。

English

Autonomous Vehicles (AVs) have entered the commercialization stage, but their limited ability to interact and express intentions still poses challenges in interactions with Human-driven Vehicles (HVs). Recent advances in large language models (LLMs) enable bidirectional human-machine communication, but the conflict between slow inference speed and the need for real-time decision-making challenges practical deployment. To address these issues, this paper introduces a parallel Actor-Reasoner framework designed to enable explicit bidirectional AV-HV interactions across multiple scenarios. First, by facilitating interactions between the LLM-driven Reasoner and heterogeneous simulated HVs during training, an interaction memory database, referred to as the Actor, is established. Then, by introducing the memory partition module and the two-layer memory retrieval module, the Actor's ability to handle heterogeneous HVs is significantly enhanced. Ablation studies and comparisons with other decision-making methods demonstrate that the proposed Actor-Reasoner framework significantly improves safety and efficiency. Finally, with the combination of the external Human-Machine Interface (eHMI) information derived from Reasoner's reasoning and the feasible action solutions retrieved from the Actor, the effectiveness of the proposed Actor-Reasoner is confirmed in multi-scenario field interactions. Our code is available at https://github.com/FanGShiYuu/Actor-Reasoner.

互動、指導以提升：基於大型語言模型的平行行動者-推理者框架，用於增強自動駕駛車輛的交互能力

Interact, Instruct to Improve: A LLM-Driven Parallel Actor-Reasoner Framework for Enhancing Autonomous Vehicle Interactions

摘要

Support