SPRING：GPT-4 通過學習論文和推理優於強化學習算法

摘要

開放世界生存遊戲對AI演算法提出重大挑戰，因為它們需要多任務處理、深度探索和目標優先排序的要求。儘管強化學習（RL）在解決遊戲方面很受歡迎，但其高樣本複雜性限制了其在像Crafter或Minecraft這樣複雜的開放世界遊戲中的有效性。我們提出了一種新方法，名為SPRING，通過閱讀遊戲的原始學術論文並利用所學知識來推理和玩遊戲，使用一個大型語言模型（LLM）。在以LaTeX源碼作為遊戲背景並提供代理人當前觀察描述的情況下，我們的SPRING框架採用一個帶有遊戲相關問題的有向無環圖（DAG）作為節點，並以依賴關係作為邊。我們通過遍歷DAG並按照拓撲順序計算LLM對每個節點的回應，將環境中採取的最佳行動識別為LLM對最終節點的回答，直接轉化為環境行動。在我們的實驗中，我們研究了在Crafter開放世界環境設置下，不同提示形式誘發的上下文“推理”的質量。定量上，使用GPT-4的SPRING優於所有最先進的RL基準線，在未經任何訓練的情況下進行了100萬步的訓練。最後，我們展示了遊戲作為LLM測試平臺的潛力。

English

Open-world survival games pose significant challenges for AI algorithms due to their multi-tasking, deep exploration, and goal prioritization requirements. Despite reinforcement learning (RL) being popular for solving games, its high sample complexity limits its effectiveness in complex open-world games like Crafter or Minecraft. We propose a novel approach, SPRING, to read the game's original academic paper and use the knowledge learned to reason and play the game through a large language model (LLM). Prompted with the LaTeX source as game context and a description of the agent's current observation, our SPRING framework employs a directed acyclic graph (DAG) with game-related questions as nodes and dependencies as edges. We identify the optimal action to take in the environment by traversing the DAG and calculating LLM responses for each node in topological order, with the LLM's answer to final node directly translating to environment actions. In our experiments, we study the quality of in-context "reasoning" induced by different forms of prompts under the setting of the Crafter open-world environment. Our experiments suggest that LLMs, when prompted with consistent chain-of-thought, have great potential in completing sophisticated high-level trajectories. Quantitatively, SPRING with GPT-4 outperforms all state-of-the-art RL baselines, trained for 1M steps, without any training. Finally, we show the potential of games as a test bed for LLMs.

SPRING：GPT-4 通過學習論文和推理優於強化學習算法

SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and Reasoning

摘要

Support