游戏思维:通过大语言模型强化学习掌握游戏推理
Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models
August 29, 2025
作者: Yi Liao, Yu Gu, Yuan Sui, Zining Zhu, Yifan Lu, Guohua Tang, Zhongqian Sun, Wei Yang
cs.AI
摘要
大型语言模型(LLMs)在数学和编程等复杂推理任务中表现出色,却常常难以完成儿童轻而易举就能胜任的简单互动任务。这一差异凸显了陈述性知识(知道某事)与程序性知识(知道如何做某事)之间的关键鸿沟。尽管传统的强化学习(RL)智能体能够通过环境交互获取程序性知识,但它们往往作为黑箱运作,且需要大量训练数据。相比之下,LLMs拥有广泛的世界知识和推理能力,却无法有效将这些静态知识转化为互动环境中的动态决策。为解决这一难题,我们提出了“游戏中思考”(Think in Games, TiG)这一创新框架,使LLMs能够通过与游戏环境的直接互动发展程序性理解,同时保留其固有的推理和解释能力。具体而言,TiG将基于RL的决策重构为语言建模任务:LLMs生成语言引导的策略,并通过基于环境反馈的在线强化学习进行迭代优化。实验结果表明,TiG成功弥合了陈述性与程序性知识之间的差距,在数据需求和计算成本显著低于传统RL方法的情况下,实现了与之相当的性能。此外,TiG为其决策提供了逐步的自然语言解释,极大地提升了复杂互动任务中的透明度和可解释性。
English
Large language models (LLMs) excel at complex reasoning tasks such as
mathematics and coding, yet they frequently struggle with simple interactive
tasks that young children perform effortlessly. This discrepancy highlights a
critical gap between declarative knowledge (knowing about something) and
procedural knowledge (knowing how to do something). Although traditional
reinforcement learning (RL) agents can acquire procedural knowledge through
environmental interaction, they often operate as black boxes and require
substantial training data. In contrast, LLMs possess extensive world knowledge
and reasoning capabilities, but are unable to effectively convert this static
knowledge into dynamic decision-making in interactive settings. To address this
challenge, we propose Think in Games (TiG), a novel framework that empowers
LLMs to develop procedural understanding through direct interaction with game
environments, while retaining their inherent reasoning and explanatory
abilities. Specifically, TiG reformulates RL-based decision-making as a
language modeling task: LLMs generate language-guided policies, which are
refined iteratively through online reinforcement learning based on
environmental feedback. Our experimental results show that TiG successfully
bridges the gap between declarative and procedural knowledge, achieving
competitive performance with dramatically lower data and computational demands
compared to conventional RL methods. Moreover, TiG provides step-by-step
natural language explanations for its decisions, greatly improving transparency
and interpretability in complex interactive tasks.