多智能体进化:基于协同进化的LLM自我优化
Multi-Agent Evolve: LLM Self-Improve through Co-evolution
October 27, 2025
作者: Yixing Chen, Yiding Wang, Siqi Zhu, Haofei Yu, Tao Feng, Muhan Zhan, Mostofa Patwary, Jiaxuan You
cs.AI
摘要
强化学习(RL)在提升大语言模型(LLM)推理能力方面展现出巨大潜力。然而,现有基于RL的LLM方法严重依赖人工标注数据集和可验证奖励机制,这限制了其扩展性与泛化能力。受游戏和围棋领域成功范式启发,近期自博弈RL方法试图摆脱人类标注数据来增强LLM推理能力,但这些方法主要依赖具身环境反馈(如Python解释器或游戏引擎),难以推广至通用领域。为解决这些挑战,我们提出多智能体进化框架(MAE),使LLM能够在数学、推理及常识问答等多元任务中实现自我进化。MAE的核心设计基于同一LLM实例化的三智能体交互架构(提议者、求解者、评判者),通过强化学习优化其行为:提议者生成问题,求解者尝试解答,评判者则在协同进化过程中进行双向评估。基于Qwen2.5-3B-Instruct模型的实验表明,MAE在多项基准测试中平均提升4.54%。这些结果证明MAE是一种可扩展、数据高效的方法,能以最小化的人类监督依赖显著提升LLM的通用推理能力。
English
Reinforcement Learning (RL) has demonstrated significant potential in
enhancing the reasoning capabilities of large language models (LLMs). However,
the success of RL for LLMs heavily relies on human-curated datasets and
verifiable rewards, which limit their scalability and generality. Recent
Self-Play RL methods, inspired by the success of the paradigm in games and Go,
aim to enhance LLM reasoning capabilities without human-annotated data.
However, their methods primarily depend on a grounded environment for feedback
(e.g., a Python interpreter or a game engine); extending them to general
domains remains challenging. To address these challenges, we propose
Multi-Agent Evolve (MAE), a framework that enables LLMs to self-evolve in
solving diverse tasks, including mathematics, reasoning, and general knowledge
Q&A. The core design of MAE is based on a triplet of interacting agents
(Proposer, Solver, Judge) that are instantiated from a single LLM, and applies
reinforcement learning to optimize their behaviors. The Proposer generates
questions, the Solver attempts solutions, and the Judge evaluates both while
co-evolving. Experiments on Qwen2.5-3B-Instruct demonstrate that MAE achieves
an average improvement of 4.54% on multiple benchmarks. These results highlight
MAE as a scalable, data-efficient method for enhancing the general reasoning
abilities of LLMs with minimal reliance on human-curated supervision.