策略家：通過雙層樹搜索讓LLMs學習戰略技能

摘要

本文中，我們提出了一種名為「策略家」的新方法，利用語言模型（LLMs）通過自我改進過程獲取在多智能體遊戲中遊戲的新技能。我們的方法通過自我對弈模擬和蒙特卡羅樹搜索以及基於LLMs的反思來收集高質量的反饋，這些反饋可以用於學習高層次的戰略技能，例如如何評估指導低層次執行的狀態。我們展示了我們的方法如何在遊戲行動規劃和對話生成的背景下使用，在這兩個任務上取得良好的表現。具體來說，我們證明了我們的方法可以幫助訓練出比傳統基於強化學習方法和其他基於LLMs技能學習方法在包括「純策略遊戲」（GOPS）和「抵抗組織：亞瓦隆」在內的遊戲中表現更好的智能體。

English

In this paper, we propose a new method Strategist that utilizes LLMs to acquire new skills for playing multi-agent games through a self-improvement process. Our method gathers quality feedback through self-play simulations with Monte Carlo tree search and LLM-based reflection, which can then be used to learn high-level strategic skills such as how to evaluate states that guide the low-level execution.We showcase how our method can be used in both action planning and dialogue generation in the context of games, achieving good performance on both tasks. Specifically, we demonstrate that our method can help train agents with better performance than both traditional reinforcement learning-based approaches and other LLM-based skill learning approaches in games including the Game of Pure Strategy (GOPS) and The Resistance: Avalon.

策略家：通過雙層樹搜索讓LLMs學習戰略技能

Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search

摘要

Support