ChatPaper.aiChatPaper

從文字到世界:大型語言模型能否成為隱性文本建構的世界模型?

From Word to World: Can Large Language Models be Implicit Text-based World Models?

December 21, 2025
作者: Yixia Li, Hongru Wang, Jiahao Qiu, Zhenfei Yin, Dongdong Zhang, Cheng Qian, Zeping Li, Pony Ma, Guanhua Chen, Heng Ji, Mengdi Wang
cs.AI

摘要

基于智能体的强化学习日益依赖经验驱动的规模扩展,然而现实环境仍存在非适应性、覆盖范围有限和难以扩展的问题。世界模型通过模拟经验提供了提升学习效率的可能路径,但大型语言模型能否可靠承担这一角色,以及在何种条件下能实质性地提升智能体性能,目前尚不明确。我们在基于文本的环境中研究这些问题——这类环境为将语言建模重新诠释为交互下的状态预测提供了受控场景。我们提出了评估基于LLM的世界模型的三层框架:(i)保真度与一致性,(ii)可扩展性与鲁棒性,(iii)智能体效用。通过对五个典型环境的测试,我们发现经过充分训练的世界模型能够保持连贯的潜在状态,随数据和模型规模实现可预测的扩展,并通过动作验证、合成轨迹生成和强化学习热启动等方式提升智能体性能。同时,这些收益关键取决于行为覆盖度和环境复杂度,由此划定了世界模型有效支持智能体学习的明确边界。
English
Agentic reinforcement learning increasingly relies on experience-driven scaling, yet real-world environments remain non-adaptive, limited in coverage, and difficult to scale. World models offer a potential way to improve learning efficiency through simulated experience, but it remains unclear whether large language models can reliably serve this role and under what conditions they meaningfully benefit agents. We study these questions in text-based environments, which provide a controlled setting to reinterpret language modeling as next-state prediction under interaction. We introduce a three-level framework for evaluating LLM-based world models: (i) fidelity and consistency, (ii) scalability and robustness, and (iii) agent utility. Across five representative environments, we find that sufficiently trained world models maintain coherent latent state, scale predictably with data and model size, and improve agent performance via action verification, synthetic trajectory generation, and warm-starting reinforcement learning. Meanwhile, these gains depend critically on behavioral coverage and environment complexity, delineating clear boundry on when world modeling effectively supports agent learning.
PDF71December 26, 2025