SPRING: GPT-4가 논문 연구와 추론을 통해 강화 학습 알고리즘을 능가하다

초록

오픈월드 생존 게임은 다중 작업, 심층 탐색, 목표 우선순위 설정 등의 요구 사항으로 인해 AI 알고리즘에 상당한 도전을 제기한다. 강화학습(RL)이 게임 해결에 널리 사용되지만, 높은 샘플 복잡성으로 인해 Crafter나 Minecraft와 같은 복잡한 오픈월드 게임에서의 효과가 제한적이다. 본 연구에서는 게임의 원본 학술 논문을 읽고 이를 통해 학습한 지식을 활용하여 대형 언어 모델(LLM)을 통해 게임을 추론하고 플레이하는 새로운 접근법인 SPRING을 제안한다. LaTeX 소스를 게임 컨텍스트로, 그리고 에이전트의 현재 관측을 설명으로 프롬프트하여, SPRING 프레임워크는 게임 관련 질문을 노드로, 의존성을 엣지로 하는 방향성 비순환 그래프(DAG)를 사용한다. DAG를 탐색하고 위상 순서에 따라 각 노드에 대한 LLM 응답을 계산함으로써 환경에서 취할 최적의 행동을 식별하며, 최종 노드에 대한 LLM의 답변은 직접 환경 행동으로 변환된다. 실험에서는 Crafter 오픈월드 환경 설정 하에서 다양한 형태의 프롬프트에 의해 유도된 컨텍스트 내 "추론"의 질을 연구한다. 실험 결과, 일관된 사고의 연쇄(chain-of-thought)로 프롬프트된 LLM은 정교한 고수준 궤적을 완성하는 데 큰 잠재력을 보여준다. 양적으로, GPT-4를 사용한 SPRING은 1M 단계로 훈련된 모든 최신 RL 베이스라인을 훈련 없이 능가한다. 마지막으로, 게임이 LLM을 위한 테스트베드로서의 잠재력을 보여준다.

English

Open-world survival games pose significant challenges for AI algorithms due to their multi-tasking, deep exploration, and goal prioritization requirements. Despite reinforcement learning (RL) being popular for solving games, its high sample complexity limits its effectiveness in complex open-world games like Crafter or Minecraft. We propose a novel approach, SPRING, to read the game's original academic paper and use the knowledge learned to reason and play the game through a large language model (LLM). Prompted with the LaTeX source as game context and a description of the agent's current observation, our SPRING framework employs a directed acyclic graph (DAG) with game-related questions as nodes and dependencies as edges. We identify the optimal action to take in the environment by traversing the DAG and calculating LLM responses for each node in topological order, with the LLM's answer to final node directly translating to environment actions. In our experiments, we study the quality of in-context "reasoning" induced by different forms of prompts under the setting of the Crafter open-world environment. Our experiments suggest that LLMs, when prompted with consistent chain-of-thought, have great potential in completing sophisticated high-level trajectories. Quantitatively, SPRING with GPT-4 outperforms all state-of-the-art RL baselines, trained for 1M steps, without any training. Finally, we show the potential of games as a test bed for LLMs.

SPRING: GPT-4가 논문 연구와 추론을 통해 강화 학습 알고리즘을 능가하다

SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and Reasoning

초록

Support