变革的推动者：面向战略规划的自进化大语言模型代理

摘要

近期大语言模型（LLMs）的进展使其能够作为自主代理应用于多种任务，然而它们在制定并坚持长期连贯策略方面仍面临挑战。本文探讨了当LLM代理被置于明确挑战其战略规划能力的环境中时，是否能够实现自我提升。通过开源框架Catanatron访问的棋盘游戏《卡坦岛》，我们对一系列基于LLM的代理进行了基准测试，从简单的游戏玩家代理到能够自主重写自身提示及玩家代理代码的系统。我们引入了一种多代理架构，其中专门角色（分析员、研究员、编码员和玩家）协作迭代分析游戏玩法、研究新策略，并修改代理的逻辑或提示。通过将手工制作的代理与完全由LLM进化的代理进行比较，我们评估了这些系统在诊断失败和随时间适应方面的有效性。我们的结果表明，自我进化的代理，尤其是在Claude 3.7和GPT-4o等模型驱动下，通过自主调整策略、向游戏玩家代理传递样本行为，并在多次迭代中展示适应性推理，超越了静态基线。

English

Recent advances in LLMs have enabled their use as autonomous agents across a range of tasks, yet they continue to struggle with formulating and adhering to coherent long-term strategies. In this paper, we investigate whether LLM agents can self-improve when placed in environments that explicitly challenge their strategic planning abilities. Using the board game Settlers of Catan, accessed through the open-source Catanatron framework, we benchmark a progression of LLM-based agents, from a simple game-playing agent to systems capable of autonomously rewriting their own prompts and their player agent's code. We introduce a multi-agent architecture in which specialized roles (Analyzer, Researcher, Coder, and Player) collaborate to iteratively analyze gameplay, research new strategies, and modify the agent's logic or prompt. By comparing manually crafted agents to those evolved entirely by LLMs, we evaluate how effectively these systems can diagnose failure and adapt over time. Our results show that self-evolving agents, particularly when powered by models like Claude 3.7 and GPT-4o, outperform static baselines by autonomously adopting their strategies, passing along sample behavior to game-playing agents, and demonstrating adaptive reasoning over multiple iterations.

变革的推动者：面向战略规划的自进化大语言模型代理

Agents of Change: Self-Evolving LLM Agents for Strategic Planning

摘要

Support