エージェントと世界のギャップを埋める：LLMベースのエージェントのためのテキスト世界モデル

要旨

大規模言語モデル（LLM）ベースのエージェントは、ウェブナビゲーションやコード編集からツール使用、長期的な対話に至るまで、対話型テキスト環境でますます利用されている。しかし、その多くは依然として受動的であり、環境がどのように構造化され変化するかに関する明示的なモデルを持たず、観測を行動に写像しているにすぎない。この背景から、テキストワールドモデル（TWM）が注目される。テキストワールドモデルとは、状態と候補行動を与えられたとき、結果として得られるウェブページ、端末出力、API応答、またはユーザの返信を予測する、テキスト状態上の遷移モデルであり、これにより計画立案、効率的な学習、原理的な評価が可能となる。本稿では、LLMベースエージェントのためのテキストワールドモデルについて、形式フレームワークとエージェントのライフサイクルに基づき体系的にレビューする。（1）基礎：テキストワールドモデルを定義し、状態表現と接地領域によって特徴づける。（2）構築：LLMをWMとする手法とコードをWMとする手法を分類し、それらの構築方法を概観する。（3）応用：ワールドモデルがどのようにエージェントを訓練時における経験合成や推論時における計画・検証・適応を通じて支援するかを考察する。（4）評価：ワールドモデル自体の評価と、それをエージェントの評価環境として利用する方法の両方を扱う。本稿は、急速に発展するこの分野を整理し、その設計空間を明確にし、今後の研究における未解決の課題を浮き彫りにすることを目的とする。

English

Large language model (LLM)-based agents are increasingly used in interactive textual environments, from web navigation and code editing to tool use and long-horizon dialogue. Yet many remain largely reactive, mapping observations to actions without an explicit model of how these environments are structured and evolve. This motivates text world models (TWMs): transition models over textual states that, given a state and a candidate action, predict the resulting webpage, terminal output, API response, or user reply, thereby supporting planning, efficient learning, and principled evaluation. We systematically review text world models for LLM-based agents, organized around a formal framework and the agent lifecycle: (1) Foundations, defining text world models and characterizing them by state representation and grounding domain; (2) Construction, taxonomizing LLM-as-WM and code-as-WM paradigms and reviewing methods for building them; (3) Application, examining how world models support agents at training time through experience synthesis and at inference time through planning, verification, and adaptation; and (4) Evaluation, covering both evaluation of the world model itself and its use as an evaluation environment for agents. We aim to consolidate this rapidly developing area, clarify its design space, and highlight open challenges for future research.