LLM中介的多智能體強化學習系統指導
LLM-Mediated Guidance of MARL Systems
March 16, 2025
作者: Philipp D. Siedler, Ian Gemp
cs.AI
摘要
在複雜的多智能體環境中,實現高效的學習和理想的行為對於多智能體強化學習(MARL)系統而言是一項重大挑戰。本研究探討了將MARL與大型語言模型(LLM)介導的干預相結合,以引導智能體朝向更理想行為的潛力。具體而言,我們研究了如何利用LLM來解釋和促進干預,從而塑造多個智能體的學習軌跡。我們實驗了兩種干預類型,分別稱為控制器:自然語言(NL)控制器和基於規則(RB)控制器。其中,使用LLM模擬人類干預的NL控制器展現出比RB控制器更強的影響力。我們的發現表明,智能體特別受益於早期干預,這能帶來更高效的訓練和更高的性能。兩種干預類型均優於無干預的基線,突顯了LLM介導的指導在加速訓練和提升MARL在挑戰性環境中性能的潛力。
English
In complex multi-agent environments, achieving efficient learning and
desirable behaviours is a significant challenge for Multi-Agent Reinforcement
Learning (MARL) systems. This work explores the potential of combining MARL
with Large Language Model (LLM)-mediated interventions to guide agents toward
more desirable behaviours. Specifically, we investigate how LLMs can be used to
interpret and facilitate interventions that shape the learning trajectories of
multiple agents. We experimented with two types of interventions, referred to
as controllers: a Natural Language (NL) Controller and a Rule-Based (RB)
Controller. The NL Controller, which uses an LLM to simulate human-like
interventions, showed a stronger impact than the RB Controller. Our findings
indicate that agents particularly benefit from early interventions, leading to
more efficient training and higher performance. Both intervention types
outperform the baseline without interventions, highlighting the potential of
LLM-mediated guidance to accelerate training and enhance MARL performance in
challenging environments.Summary
AI-Generated Summary