LLM中介的多智能體強化學習系統指導

摘要

在複雜的多智能體環境中，實現高效的學習和理想的行為對於多智能體強化學習（MARL）系統而言是一項重大挑戰。本研究探討了將MARL與大型語言模型（LLM）介導的干預相結合，以引導智能體朝向更理想行為的潛力。具體而言，我們研究了如何利用LLM來解釋和促進干預，從而塑造多個智能體的學習軌跡。我們實驗了兩種干預類型，分別稱為控制器：自然語言（NL）控制器和基於規則（RB）控制器。其中，使用LLM模擬人類干預的NL控制器展現出比RB控制器更強的影響力。我們的發現表明，智能體特別受益於早期干預，這能帶來更高效的訓練和更高的性能。兩種干預類型均優於無干預的基線，突顯了LLM介導的指導在加速訓練和提升MARL在挑戰性環境中性能的潛力。

English

In complex multi-agent environments, achieving efficient learning and desirable behaviours is a significant challenge for Multi-Agent Reinforcement Learning (MARL) systems. This work explores the potential of combining MARL with Large Language Model (LLM)-mediated interventions to guide agents toward more desirable behaviours. Specifically, we investigate how LLMs can be used to interpret and facilitate interventions that shape the learning trajectories of multiple agents. We experimented with two types of interventions, referred to as controllers: a Natural Language (NL) Controller and a Rule-Based (RB) Controller. The NL Controller, which uses an LLM to simulate human-like interventions, showed a stronger impact than the RB Controller. Our findings indicate that agents particularly benefit from early interventions, leading to more efficient training and higher performance. Both intervention types outperform the baseline without interventions, highlighting the potential of LLM-mediated guidance to accelerate training and enhance MARL performance in challenging environments.

LLM中介的多智能體強化學習系統指導

LLM-Mediated Guidance of MARL Systems

摘要

Support