変革のエージェント：戦略的計画のための自己進化型LLMエージェント

要旨

最近のLLM（大規模言語モデル）の進歩により、さまざまなタスクにおいて自律エージェントとしての利用が可能となったが、一貫した長期戦略の策定と遵守には依然として課題を抱えている。本論文では、戦略的計画能力を明示的に試す環境に置かれたLLMエージェントが、自己改善できるかどうかを調査する。オープンソースのCatanatronフレームワークを通じてアクセス可能なボードゲーム「Settlers of Catan」を使用し、シンプルなゲームプレイエージェントから、自身のプロンプトやプレイヤーエージェントのコードを自律的に書き換えることができるシステムまで、LLMベースのエージェントの進化をベンチマークする。本論文では、専門的な役割（Analyzer、Researcher、Coder、Player）が協力してゲームプレイを分析し、新たな戦略を研究し、エージェントのロジックやプロンプトを修正するマルチエージェントアーキテクチャを導入する。手動で作成されたエージェントとLLMによって完全に進化したエージェントを比較することで、これらのシステムが失敗を診断し、時間とともに適応する能力を評価する。結果として、特にClaude 3.7やGPT-4oのようなモデルを活用した自己進化型エージェントは、静的ベースラインを上回り、戦略を自律的に採用し、ゲームプレイエージェントにサンプル行動を引き継ぎ、複数のイテレーションにわたって適応的な推論を示すことが明らかとなった。

English

Recent advances in LLMs have enabled their use as autonomous agents across a range of tasks, yet they continue to struggle with formulating and adhering to coherent long-term strategies. In this paper, we investigate whether LLM agents can self-improve when placed in environments that explicitly challenge their strategic planning abilities. Using the board game Settlers of Catan, accessed through the open-source Catanatron framework, we benchmark a progression of LLM-based agents, from a simple game-playing agent to systems capable of autonomously rewriting their own prompts and their player agent's code. We introduce a multi-agent architecture in which specialized roles (Analyzer, Researcher, Coder, and Player) collaborate to iteratively analyze gameplay, research new strategies, and modify the agent's logic or prompt. By comparing manually crafted agents to those evolved entirely by LLMs, we evaluate how effectively these systems can diagnose failure and adapt over time. Our results show that self-evolving agents, particularly when powered by models like Claude 3.7 and GPT-4o, outperform static baselines by autonomously adopting their strategies, passing along sample behavior to game-playing agents, and demonstrating adaptive reasoning over multiple iterations.

変革のエージェント：戦略的計画のための自己進化型LLMエージェント

Agents of Change: Self-Evolving LLM Agents for Strategic Planning

要旨

Support