ChatPaper.aiChatPaper

LLM介导的多智能体强化学习系统指导

LLM-Mediated Guidance of MARL Systems

March 16, 2025
作者: Philipp D. Siedler, Ian Gemp
cs.AI

摘要

在复杂的多智能体环境中,实现高效学习与理想行为是多智能体强化学习(MARL)系统面临的一大挑战。本研究探讨了将MARL与大型语言模型(LLM)介导的干预相结合,以引导智能体朝向更理想行为的潜力。具体而言,我们研究了如何利用LLM来解读并促进干预,从而塑造多个智能体的学习轨迹。我们实验了两种干预方式,分别称为自然语言(NL)控制器和基于规则(RB)的控制器。其中,NL控制器通过LLM模拟类人干预,显示出比RB控制器更强的影响力。我们的研究结果表明,智能体尤其受益于早期干预,这不仅提升了训练效率,还提高了整体性能。两种干预方式均优于无干预的基线情况,凸显了LLM介导的指导在加速训练和提升MARL在复杂环境中的性能方面的巨大潜力。
English
In complex multi-agent environments, achieving efficient learning and desirable behaviours is a significant challenge for Multi-Agent Reinforcement Learning (MARL) systems. This work explores the potential of combining MARL with Large Language Model (LLM)-mediated interventions to guide agents toward more desirable behaviours. Specifically, we investigate how LLMs can be used to interpret and facilitate interventions that shape the learning trajectories of multiple agents. We experimented with two types of interventions, referred to as controllers: a Natural Language (NL) Controller and a Rule-Based (RB) Controller. The NL Controller, which uses an LLM to simulate human-like interventions, showed a stronger impact than the RB Controller. Our findings indicate that agents particularly benefit from early interventions, leading to more efficient training and higher performance. Both intervention types outperform the baseline without interventions, highlighting the potential of LLM-mediated guidance to accelerate training and enhance MARL performance in challenging environments.

Summary

AI-Generated Summary

PDF32March 21, 2025