我思，故我遊：一個通過推理與規劃學習遊戲的智能體

摘要

追求能夠掌握複雜環境的人工智慧代理已取得顯著成功，然而現有的深度強化學習方法往往依賴於大量的經驗，並將其知識不透明地編碼於神經網絡權重中。我們提出了一種不同的範式，其中代理通過推理和規劃來學習遊戲。我們介紹了「Cogito, ergo ludo」（CEL），這是一種新穎的代理架構，利用大型語言模型（LLM）來建立對其環境機制及自身策略的明確、基於語言的理解。從一無所知的初始狀態（僅知動作集）出發，CEL在互動與反思的循環中運作。每輪遊戲後，代理分析其完整軌跡，進行兩個並行的學習過程：規則歸納，即精煉其對環境動態的明確模型；以及策略與戰術手冊總結，即將經驗提煉為可操作的戰略手冊。我們在多樣的網格世界任務（如掃雷、冰湖和倉庫番）上評估CEL，並展示CEL代理能通過自主發現遊戲規則並從稀疏獎勵中發展有效策略，成功掌握這些遊戲。消融研究證實，迭代過程對持續學習至關重要。我們的工作展示了一條通向更通用且可解釋的代理之路，這些代理不僅能有效行動，還能通過對原始經驗的明確推理，建立一個透明且不斷改進的世界模型。

English

The pursuit of artificial agents that can learn to master complex environments has led to remarkable successes, yet prevailing deep reinforcement learning methods often rely on immense experience, encoding their knowledge opaquely within neural network weights. We propose a different paradigm, one in which an agent learns to play by reasoning and planning. We introduce Cogito, ergo ludo (CEL), a novel agent architecture that leverages a Large Language Model (LLM) to build an explicit, language-based understanding of its environment's mechanics and its own strategy. Starting from a tabula rasa state with no prior knowledge (except action set), CEL operates on a cycle of interaction and reflection. After each episode, the agent analyzes its complete trajectory to perform two concurrent learning processes: Rule Induction, where it refines its explicit model of the environment's dynamics, and Strategy and Playbook Summarization, where it distills experiences into an actionable strategic playbook. We evaluate CEL on diverse grid-world tasks (i.e., Minesweeper, Frozen Lake, and Sokoban), and show that the CEL agent successfully learns to master these games by autonomously discovering their rules and developing effective policies from sparse rewards. Ablation studies confirm that the iterative process is critical for sustained learning. Our work demonstrates a path toward more general and interpretable agents that not only act effectively but also build a transparent and improving model of their world through explicit reasoning on raw experience.

我思，故我遊：一個通過推理與規劃學習遊戲的智能體

Cogito, Ergo Ludo: An Agent that Learns to Play by Reasoning and Planning

摘要

Support