Orientación de Sistemas MARL Mediante LLM

Resumen

En entornos complejos de múltiples agentes, lograr un aprendizaje eficiente y comportamientos deseables representa un desafío significativo para los sistemas de Aprendizaje por Refuerzo Multiagente (MARL). Este trabajo explora el potencial de combinar MARL con intervenciones mediadas por Modelos de Lenguaje de Gran Escala (LLM) para guiar a los agentes hacia comportamientos más deseables. Específicamente, investigamos cómo los LLM pueden utilizarse para interpretar y facilitar intervenciones que moldeen las trayectorias de aprendizaje de múltiples agentes. Experimentamos con dos tipos de intervenciones, denominadas controladores: un Controlador de Lenguaje Natural (NL) y un Controlador Basado en Reglas (RB). El Controlador NL, que utiliza un LLM para simular intervenciones similares a las humanas, mostró un impacto más fuerte que el Controlador RB. Nuestros hallazgos indican que los agentes se benefician particularmente de intervenciones tempranas, lo que conduce a un entrenamiento más eficiente y un mayor rendimiento. Ambos tipos de intervenciones superan la línea base sin intervenciones, destacando el potencial de la guía mediada por LLM para acelerar el entrenamiento y mejorar el rendimiento de MARL en entornos desafiantes.

English

In complex multi-agent environments, achieving efficient learning and desirable behaviours is a significant challenge for Multi-Agent Reinforcement Learning (MARL) systems. This work explores the potential of combining MARL with Large Language Model (LLM)-mediated interventions to guide agents toward more desirable behaviours. Specifically, we investigate how LLMs can be used to interpret and facilitate interventions that shape the learning trajectories of multiple agents. We experimented with two types of interventions, referred to as controllers: a Natural Language (NL) Controller and a Rule-Based (RB) Controller. The NL Controller, which uses an LLM to simulate human-like interventions, showed a stronger impact than the RB Controller. Our findings indicate that agents particularly benefit from early interventions, leading to more efficient training and higher performance. Both intervention types outperform the baseline without interventions, highlighting the potential of LLM-mediated guidance to accelerate training and enhance MARL performance in challenging environments.

Orientación de Sistemas MARL Mediante LLM

LLM-Mediated Guidance of MARL Systems

Resumen

Support