我思故我玩：一个通过推理与规划学习游戏的智能体

摘要

追求能够掌握复杂环境的人工智能代理已取得显著成就，然而主流的深度强化学习方法往往依赖于海量经验，将其知识隐晦地编码于神经网络权重之中。我们提出了一种不同的范式，即代理通过推理与规划来学习游戏。我们引入了“我思故我玩”（Cogito, ergo ludo, CEL），这是一种新颖的代理架构，它利用大型语言模型（LLM）构建对游戏环境机制及自身策略的显式、基于语言的理解。CEL从零开始，除动作集外无任何先验知识，通过交互与反思的循环运作。每轮游戏结束后，代理分析其完整轨迹，并行执行两项学习过程：规则归纳，即精炼其对环境动态的显式模型；以及策略与战术手册总结，将经验提炼为可操作的策略手册。我们在多种网格世界任务（如扫雷、冰冻湖面和推箱子）上评估CEL，结果表明，CEL代理能够通过自主发现游戏规则并从稀疏奖励中发展有效策略，成功掌握这些游戏。消融研究证实，迭代过程对于持续学习至关重要。我们的工作展示了一条通向更通用、更可解释代理的路径，这些代理不仅行动高效，还能通过对原始经验的显式推理，构建一个透明且不断改进的世界模型。

English

The pursuit of artificial agents that can learn to master complex environments has led to remarkable successes, yet prevailing deep reinforcement learning methods often rely on immense experience, encoding their knowledge opaquely within neural network weights. We propose a different paradigm, one in which an agent learns to play by reasoning and planning. We introduce Cogito, ergo ludo (CEL), a novel agent architecture that leverages a Large Language Model (LLM) to build an explicit, language-based understanding of its environment's mechanics and its own strategy. Starting from a tabula rasa state with no prior knowledge (except action set), CEL operates on a cycle of interaction and reflection. After each episode, the agent analyzes its complete trajectory to perform two concurrent learning processes: Rule Induction, where it refines its explicit model of the environment's dynamics, and Strategy and Playbook Summarization, where it distills experiences into an actionable strategic playbook. We evaluate CEL on diverse grid-world tasks (i.e., Minesweeper, Frozen Lake, and Sokoban), and show that the CEL agent successfully learns to master these games by autonomously discovering their rules and developing effective policies from sparse rewards. Ablation studies confirm that the iterative process is critical for sustained learning. Our work demonstrates a path toward more general and interpretable agents that not only act effectively but also build a transparent and improving model of their world through explicit reasoning on raw experience.

我思故我玩：一个通过推理与规划学习游戏的智能体

Cogito, Ergo Ludo: An Agent that Learns to Play by Reasoning and Planning

摘要

Support