전략가: LLM을 통한 이중 트리 탐색을 통한 전략적 기술 학습

초록

본 논문에서는 LLM(Large Language Models)을 활용하여 다중 에이전트 게임에서 새로운 기술을 습득하는 데 활용되는 Strategist라는 새로운 방법을 제안합니다. 저희 방법은 Monte Carlo tree search와 LLM 기반 반성을 통해 자가 개선 과정을 통해 품질 피드백을 수집하며, 이를 통해 하위 실행을 안내하는 상태를 평가하는 고수준 전략 기술을 학습하는 데 활용할 수 있습니다. 저희 방법이 게임 관련의 행동 계획 및 대화 생성에 어떻게 활용될 수 있는지를 보여주며, 이를 통해 두 작업에서 좋은 성과를 달성합니다. 구체적으로, 저희 방법이 GOPS(Game of Pure Strategy)와 The Resistance: Avalon을 포함한 게임에서 기존의 강화 학습 기반 접근법 및 다른 LLM 기반 기술 학습 접근법보다 더 나은 성능을 가진 에이전트를 훈련하는 데 도움이 될 수 있다는 것을 입증합니다.

English

In this paper, we propose a new method Strategist that utilizes LLMs to acquire new skills for playing multi-agent games through a self-improvement process. Our method gathers quality feedback through self-play simulations with Monte Carlo tree search and LLM-based reflection, which can then be used to learn high-level strategic skills such as how to evaluate states that guide the low-level execution.We showcase how our method can be used in both action planning and dialogue generation in the context of games, achieving good performance on both tasks. Specifically, we demonstrate that our method can help train agents with better performance than both traditional reinforcement learning-based approaches and other LLM-based skill learning approaches in games including the Game of Pure Strategy (GOPS) and The Resistance: Avalon.

전략가: LLM을 통한 이중 트리 탐색을 통한 전략적 기술 학습

Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search

초록

Support