SPRING：GPT-4 通过学习论文和推理优于强化学习算法

摘要

开放世界生存游戏对AI算法提出了重大挑战，因为它们需要多任务处理、深度探索和目标优先级要求。尽管强化学习（RL）在解决游戏方面很受欢迎，但其高样本复杂性限制了其在复杂的开放世界游戏（如Crafter或Minecraft）中的有效性。我们提出了一种新方法，名为SPRING，通过阅读游戏的原始学术论文并利用所学知识来推理和玩游戏，采用了一个大型语言模型（LLM）。在以LaTeX源代码作为游戏背景和描述代理当前观察的情况的情况下，我们的SPRING框架采用了一个带有游戏相关问题作为节点和依赖关系作为边的有向无环图（DAG）。我们通过遍历DAG并按拓扑顺序计算每个节点的LLM响应来确定在环境中采取的最佳行动，LLM对最终节点的回答直接转化为环境行动。在我们的实验中，我们研究了在Crafter开放世界环境设置下，不同形式提示下引发的上下文“推理”的质量。我们的实验表明，当LLM以一致的思维链提示时，在完成复杂的高级轨迹方面具有巨大潜力。定量上，使用GPT-4的SPRING胜过所有最先进的RL基线，在进行100万步训练的情况下，无需任何训练。最后，我们展示了游戏作为LLM测试平台的潜力。

English

Open-world survival games pose significant challenges for AI algorithms due to their multi-tasking, deep exploration, and goal prioritization requirements. Despite reinforcement learning (RL) being popular for solving games, its high sample complexity limits its effectiveness in complex open-world games like Crafter or Minecraft. We propose a novel approach, SPRING, to read the game's original academic paper and use the knowledge learned to reason and play the game through a large language model (LLM). Prompted with the LaTeX source as game context and a description of the agent's current observation, our SPRING framework employs a directed acyclic graph (DAG) with game-related questions as nodes and dependencies as edges. We identify the optimal action to take in the environment by traversing the DAG and calculating LLM responses for each node in topological order, with the LLM's answer to final node directly translating to environment actions. In our experiments, we study the quality of in-context "reasoning" induced by different forms of prompts under the setting of the Crafter open-world environment. Our experiments suggest that LLMs, when prompted with consistent chain-of-thought, have great potential in completing sophisticated high-level trajectories. Quantitatively, SPRING with GPT-4 outperforms all state-of-the-art RL baselines, trained for 1M steps, without any training. Finally, we show the potential of games as a test bed for LLMs.

SPRING：GPT-4 通过学习论文和推理优于强化学习算法

SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and Reasoning

摘要

Support