SPRING:GPT-4 通过学习论文和推理优于强化学习算法
SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and Reasoning
May 24, 2023
作者: Yue Wu, So Yeon Min, Shrimai Prabhumoye, Yonatan Bisk, Ruslan Salakhutdinov, Amos Azaria, Tom Mitchell, Yuanzhi Li
cs.AI
摘要
开放世界生存游戏对AI算法提出了重大挑战,因为它们需要多任务处理、深度探索和目标优先级要求。尽管强化学习(RL)在解决游戏方面很受欢迎,但其高样本复杂性限制了其在复杂的开放世界游戏(如Crafter或Minecraft)中的有效性。我们提出了一种新方法,名为SPRING,通过阅读游戏的原始学术论文并利用所学知识来推理和玩游戏,采用了一个大型语言模型(LLM)。在以LaTeX源代码作为游戏背景和描述代理当前观察的情况的情况下,我们的SPRING框架采用了一个带有游戏相关问题作为节点和依赖关系作为边的有向无环图(DAG)。我们通过遍历DAG并按拓扑顺序计算每个节点的LLM响应来确定在环境中采取的最佳行动,LLM对最终节点的回答直接转化为环境行动。在我们的实验中,我们研究了在Crafter开放世界环境设置下,不同形式提示下引发的上下文“推理”的质量。我们的实验表明,当LLM以一致的思维链提示时,在完成复杂的高级轨迹方面具有巨大潜力。定量上,使用GPT-4的SPRING胜过所有最先进的RL基线,在进行100万步训练的情况下,无需任何训练。最后,我们展示了游戏作为LLM测试平台的潜力。
English
Open-world survival games pose significant challenges for AI algorithms due
to their multi-tasking, deep exploration, and goal prioritization requirements.
Despite reinforcement learning (RL) being popular for solving games, its high
sample complexity limits its effectiveness in complex open-world games like
Crafter or Minecraft. We propose a novel approach, SPRING, to read the game's
original academic paper and use the knowledge learned to reason and play the
game through a large language model (LLM). Prompted with the LaTeX source as
game context and a description of the agent's current observation, our SPRING
framework employs a directed acyclic graph (DAG) with game-related questions as
nodes and dependencies as edges. We identify the optimal action to take in the
environment by traversing the DAG and calculating LLM responses for each node
in topological order, with the LLM's answer to final node directly translating
to environment actions. In our experiments, we study the quality of in-context
"reasoning" induced by different forms of prompts under the setting of the
Crafter open-world environment. Our experiments suggest that LLMs, when
prompted with consistent chain-of-thought, have great potential in completing
sophisticated high-level trajectories. Quantitatively, SPRING with GPT-4
outperforms all state-of-the-art RL baselines, trained for 1M steps, without
any training. Finally, we show the potential of games as a test bed for LLMs.