ChatPaper.aiChatPaper

CoMAS:基于交互奖励的协同进化多智能体系统

CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards

October 9, 2025
作者: Xiangyuan Xue, Yifan Zhou, Guibin Zhang, Zaibin Zhang, Yijiang Li, Chen Zhang, Zhenfei Yin, Philip Torr, Wanli Ouyang, Lei Bai
cs.AI

摘要

自我进化是推动基于大规模语言模型(LLM)的智能体在预训练后持续提升能力的一个核心研究课题。近期研究见证了从无强化学习(RL)方法向基于RL方法的转变。当前的基于RL的方法要么依赖于密集的外部奖励信号,要么从LLM自身提取内在奖励信号。然而,这些方法与人智中观察到的自我进化机制存在偏差,后者中个体通过相互讨论与协作来学习与进步。本研究中,我们引入了协同进化多智能体系统(CoMAS),这是一个新颖的框架,它使得智能体能够在无外部监督的情况下,通过智能体间的交互学习来自主提升。CoMAS从丰富的讨论动态中生成内在奖励,采用LLM作为评判者的机制来构建这些奖励,并通过RL优化每个智能体的策略,从而实现去中心化且可扩展的协同进化。实验结果表明,CoMAS在多数评估设置中均优于未经训练的智能体,并达到了最先进的性能。消融研究证实了基于交互的奖励信号的必要性,并揭示了随着智能体数量与多样性的增加,系统展现出良好的可扩展性。这些发现确立了CoMAS作为LLM基智能体自我进化的一种新颖且有效的范式。
English
Self-evolution is a central research topic in enabling large language model (LLM)-based agents to continually improve their capabilities after pretraining. Recent research has witnessed a transition from reinforcement learning (RL)-free to RL-based methods. Current RL-based methods either rely on dense external reward signals or extract intrinsic reward signals from LLMs themselves. However, these approaches diverge from the self-evolution mechanisms observed in human intelligence, where individuals learn and improve through mutual discussion and collaboration. In this work, we introduce Co-Evolving Multi-Agent Systems (CoMAS), a novel framework that enables agents to improve autonomously by learning from inter-agent interactions without external supervision. CoMAS generates intrinsic rewards from rich discussion dynamics, employs an LLM-as-a-judge mechanism to formulate these rewards, and optimizes each agent's policy through RL, thereby enabling decentralized and scalable co-evolution. Experimental results demonstrate that CoMAS consistently outperforms untrained agents and achieves state-of-the-art performance across most evaluation settings. Ablation studies confirm the necessity of interaction-based reward signals and reveal promising scalability as the number and diversity of agents increase. These findings establish CoMAS as a novel and effective paradigm for self-evolution in LLM-based agents.
PDF162October 10, 2025