ChatPaper.aiChatPaper

以遊戲思維:透過大型語言模型與強化學習來學習遊戲推理

Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models

August 29, 2025
作者: Yi Liao, Yu Gu, Yuan Sui, Zining Zhu, Yifan Lu, Guohua Tang, Zhongqian Sun, Wei Yang
cs.AI

摘要

大型語言模型(LLMs)在數學和編碼等複雜推理任務中表現出色,卻常常在幼兒能輕鬆完成的簡單互動任務中舉步維艱。這種差異凸顯了陳述性知識(知道某事)與程序性知識(知道如何做某事)之間的重要鴻溝。儘管傳統的強化學習(RL)代理能通過環境互動獲取程序性知識,但它們往往作為黑箱運作,且需要大量訓練數據。相比之下,LLMs擁有廣泛的世界知識和推理能力,卻無法有效地將這些靜態知識轉化為互動情境中的動態決策。為應對這一挑戰,我們提出了“遊戲中思考”(Think in Games, TiG)這一新框架,使LLMs能夠通過直接與遊戲環境互動來發展程序性理解,同時保留其固有的推理和解釋能力。具體而言,TiG將基於RL的決策制定重新表述為語言建模任務:LLMs生成語言引導的策略,並通過基於環境反饋的在線強化學習進行迭代優化。實驗結果表明,TiG成功彌合了陳述性與程序性知識之間的差距,在數據和計算需求大幅降低的情況下,實現了與傳統RL方法相當的性能。此外,TiG為其決策提供了逐步的自然語言解釋,極大提升了複雜互動任務中的透明度和可解釋性。
English
Large language models (LLMs) excel at complex reasoning tasks such as mathematics and coding, yet they frequently struggle with simple interactive tasks that young children perform effortlessly. This discrepancy highlights a critical gap between declarative knowledge (knowing about something) and procedural knowledge (knowing how to do something). Although traditional reinforcement learning (RL) agents can acquire procedural knowledge through environmental interaction, they often operate as black boxes and require substantial training data. In contrast, LLMs possess extensive world knowledge and reasoning capabilities, but are unable to effectively convert this static knowledge into dynamic decision-making in interactive settings. To address this challenge, we propose Think in Games (TiG), a novel framework that empowers LLMs to develop procedural understanding through direct interaction with game environments, while retaining their inherent reasoning and explanatory abilities. Specifically, TiG reformulates RL-based decision-making as a language modeling task: LLMs generate language-guided policies, which are refined iteratively through online reinforcement learning based on environmental feedback. Our experimental results show that TiG successfully bridges the gap between declarative and procedural knowledge, achieving competitive performance with dramatically lower data and computational demands compared to conventional RL methods. Moreover, TiG provides step-by-step natural language explanations for its decisions, greatly improving transparency and interpretability in complex interactive tasks.
PDF213September 1, 2025