ChatPaper.aiChatPaper

游戏思维:通过大语言模型强化学习掌握游戏推理

Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models

August 29, 2025
作者: Yi Liao, Yu Gu, Yuan Sui, Zining Zhu, Yifan Lu, Guohua Tang, Zhongqian Sun, Wei Yang
cs.AI

摘要

大型语言模型(LLMs)在数学和编程等复杂推理任务中表现出色,却常常难以完成儿童轻而易举就能胜任的简单互动任务。这一差异凸显了陈述性知识(知道某事)与程序性知识(知道如何做某事)之间的关键鸿沟。尽管传统的强化学习(RL)智能体能够通过环境交互获取程序性知识,但它们往往作为黑箱运作,且需要大量训练数据。相比之下,LLMs拥有广泛的世界知识和推理能力,却无法有效将这些静态知识转化为互动环境中的动态决策。为解决这一难题,我们提出了“游戏中思考”(Think in Games, TiG)这一创新框架,使LLMs能够通过与游戏环境的直接互动发展程序性理解,同时保留其固有的推理和解释能力。具体而言,TiG将基于RL的决策重构为语言建模任务:LLMs生成语言引导的策略,并通过基于环境反馈的在线强化学习进行迭代优化。实验结果表明,TiG成功弥合了陈述性与程序性知识之间的差距,在数据需求和计算成本显著低于传统RL方法的情况下,实现了与之相当的性能。此外,TiG为其决策提供了逐步的自然语言解释,极大地提升了复杂互动任务中的透明度和可解释性。
English
Large language models (LLMs) excel at complex reasoning tasks such as mathematics and coding, yet they frequently struggle with simple interactive tasks that young children perform effortlessly. This discrepancy highlights a critical gap between declarative knowledge (knowing about something) and procedural knowledge (knowing how to do something). Although traditional reinforcement learning (RL) agents can acquire procedural knowledge through environmental interaction, they often operate as black boxes and require substantial training data. In contrast, LLMs possess extensive world knowledge and reasoning capabilities, but are unable to effectively convert this static knowledge into dynamic decision-making in interactive settings. To address this challenge, we propose Think in Games (TiG), a novel framework that empowers LLMs to develop procedural understanding through direct interaction with game environments, while retaining their inherent reasoning and explanatory abilities. Specifically, TiG reformulates RL-based decision-making as a language modeling task: LLMs generate language-guided policies, which are refined iteratively through online reinforcement learning based on environmental feedback. Our experimental results show that TiG successfully bridges the gap between declarative and procedural knowledge, achieving competitive performance with dramatically lower data and computational demands compared to conventional RL methods. Moreover, TiG provides step-by-step natural language explanations for its decisions, greatly improving transparency and interpretability in complex interactive tasks.
PDF183September 1, 2025