에이전트-세계 격차 해소: LLM 기반 에이전트를 위한 텍스트 세계 모델

초록

대규모 언어 모델(LLM) 기반 에이전트는 웹 내비게이션, 코드 편집, 도구 사용, 장기 대화 등 상호작용적 텍스트 환경에서 점점 더 많이 활용되고 있다. 그러나 많은 에이전트는 여전히 주로 반응적(reactive)으로, 환경이 어떻게 구조화되고 변화하는지에 대한 명시적 모델 없이 관찰을 행동에 매핑한다. 이러한 한계는 텍스트 세계 모델(TWM)의 필요성을 제기한다. TWM은 텍스트 상태에 대한 전이 모델(transition model)로서, 주어진 상태와 후보 행동에 대해 결과 웹페이지, 터미널 출력, API 응답 또는 사용자 응답을 예측함으로써 계획 수립, 효율적 학습, 원칙적 평가를 지원한다. 본 연구는 LLM 기반 에이전트를 위한 텍스트 세계 모델을 체계적으로 검토하며, 형식적 프레임워크와 에이전트 생애주기를 중심으로 다음 네 가지 측면에서 구성한다: (1) 기초(Foundations): 텍스트 세계 모델을 정의하고 상태 표현 및 기반 도메인(grounding domain)에 따라 분류; (2) 구축(Construction): LLM-세계모델(LLM-as-WM) 및 코드-세계모델(code-as-WM) 패러다임을 분류하고 이를 구축하는 방법을 검토; (3) 응용(Application): 세계 모델이 훈련 시 경험 합성(experience synthesis)과 추론 시 계획, 검증, 적응을 통해 에이전트를 지원하는 방식을 분석; (4) 평가(Evaluation): 세계 모델 자체의 평가와 이를 에이전트 평가 환경으로 활용하는 방법을 모두 포함한다. 본 연구는 이 빠르게 발전하는 영역을 통합하고, 설계 공간을 명확히 하며, 향후 연구를 위한 공개 과제를 강조하는 것을 목표로 한다.

English

Large language model (LLM)-based agents are increasingly used in interactive textual environments, from web navigation and code editing to tool use and long-horizon dialogue. Yet many remain largely reactive, mapping observations to actions without an explicit model of how these environments are structured and evolve. This motivates text world models (TWMs): transition models over textual states that, given a state and a candidate action, predict the resulting webpage, terminal output, API response, or user reply, thereby supporting planning, efficient learning, and principled evaluation. We systematically review text world models for LLM-based agents, organized around a formal framework and the agent lifecycle: (1) Foundations, defining text world models and characterizing them by state representation and grounding domain; (2) Construction, taxonomizing LLM-as-WM and code-as-WM paradigms and reviewing methods for building them; (3) Application, examining how world models support agents at training time through experience synthesis and at inference time through planning, verification, and adaptation; and (4) Evaluation, covering both evaluation of the world model itself and its use as an evaluation environment for agents. We aim to consolidate this rapidly developing area, clarify its design space, and highlight open challenges for future research.