變革的推動者：用於戰略規劃的自我進化大型語言模型代理

摘要

近期大型語言模型（LLM）的進展使其能夠作為自主代理應用於多種任務，然而在制定和遵循連貫的長期策略方面仍存在困難。本文探討了當LLM代理被置於明確挑戰其策略規劃能力的環境中時，是否能夠自我改進。我們利用開源框架Catanatron，以桌遊《卡坦島》為測試平台，對一系列基於LLM的代理進行基準測試，從簡單的遊戲代理到能夠自主重寫自身提示和玩家代理代碼的系統。我們引入了一種多代理架構，其中專門角色（分析師、研究員、編碼員和玩家）協作迭代分析遊戲過程、研究新策略並修改代理的邏輯或提示。通過比較手工製作的代理與完全由LLM進化的代理，我們評估了這些系統在診斷失敗和隨時間適應方面的有效性。結果表明，自我進化的代理，特別是在Claude 3.7和GPT-4o等模型驅動下，通過自主採納策略、向遊戲代理傳遞樣本行為，並在多輪迭代中展現適應性推理，超越了靜態基準。

English

Recent advances in LLMs have enabled their use as autonomous agents across a range of tasks, yet they continue to struggle with formulating and adhering to coherent long-term strategies. In this paper, we investigate whether LLM agents can self-improve when placed in environments that explicitly challenge their strategic planning abilities. Using the board game Settlers of Catan, accessed through the open-source Catanatron framework, we benchmark a progression of LLM-based agents, from a simple game-playing agent to systems capable of autonomously rewriting their own prompts and their player agent's code. We introduce a multi-agent architecture in which specialized roles (Analyzer, Researcher, Coder, and Player) collaborate to iteratively analyze gameplay, research new strategies, and modify the agent's logic or prompt. By comparing manually crafted agents to those evolved entirely by LLMs, we evaluate how effectively these systems can diagnose failure and adapt over time. Our results show that self-evolving agents, particularly when powered by models like Claude 3.7 and GPT-4o, outperform static baselines by autonomously adopting their strategies, passing along sample behavior to game-playing agents, and demonstrating adaptive reasoning over multiple iterations.

變革的推動者：用於戰略規劃的自我進化大型語言模型代理

Agents of Change: Self-Evolving LLM Agents for Strategic Planning

摘要

Support