策略师：通过双层树搜索让LLMs学习战略技能

摘要

本文提出了一种名为Strategist的新方法，利用LLM来通过自我改进过程获取在多智能体游戏中发挥新技能的能力。我们的方法通过自我对弈模拟和基于Monte Carlo树搜索和LLM反思来收集高质量反馈，然后利用这些反馈来学习高级战略技能，比如如何评估指导低级执行的状态。我们展示了我们的方法如何在游戏行动规划和对话生成中发挥作用，在这些任务中取得了良好的表现。具体来说，我们证明了我们的方法可以帮助训练出表现优于传统基于强化学习方法和其他基于LLM技能学习方法的代理的代理，在包括纯策略博弈（GOPS）和《抵抗组织：阿瓦隆》在内的游戏中。

English

In this paper, we propose a new method Strategist that utilizes LLMs to acquire new skills for playing multi-agent games through a self-improvement process. Our method gathers quality feedback through self-play simulations with Monte Carlo tree search and LLM-based reflection, which can then be used to learn high-level strategic skills such as how to evaluate states that guide the low-level execution.We showcase how our method can be used in both action planning and dialogue generation in the context of games, achieving good performance on both tasks. Specifically, we demonstrate that our method can help train agents with better performance than both traditional reinforcement learning-based approaches and other LLM-based skill learning approaches in games including the Game of Pure Strategy (GOPS) and The Resistance: Avalon.