PokéChamp: エキスパートレベルのミニマックス言語エージェント

要旨

我々は、ポケモンバトル用のミニマックスエージェント「Pok\'eChamp」を紹介する。これは大規模言語モデル（LLM）を活用したもので、2人用競技ゲームの汎用フレームワークに基づいて構築されている。Pok\'eChampは、LLMの汎用能力を活用してミニマックス木探索を強化する。具体的には、LLMが3つの主要モジュールを置き換える：(1)プレイヤー行動サンプリング、(2)対戦相手モデリング、(3)価値関数推定。これにより、エージェントはゲームプレイの履歴と人間の知識を効果的に活用して探索空間を縮小し、部分観測性に対処できる。特に、このフレームワークは追加のLLMトレーニングを必要としない。我々はPok\'eChampを人気のGen 9 OUフォーマットで評価した。GPT-4oを搭載した場合、既存の最良のLLMベースのボットに対して76%の勝率を達成し、最強のルールベースのボットに対しては84%の勝率を示し、その優れた性能を実証した。オープンソースの80億パラメータLlama 3.1モデルを使用した場合でも、Pok\'eChampはGPT-4oを搭載した従来の最良のLLMベースボット「Pok\'ellmon」を64%の勝率で一貫して上回った。Pok\'eChampは、Pok\'emon Showdownのオンラインラダーで1300-1500のEloを達成し、人間プレイヤーの上位30%-10%に位置する。さらに、この研究では、300万以上のゲーム（うち50万以上の高Eloマッチを含む）を特徴とする最大の実プレイヤーポケモンバトルデータセットを編纂した。このデータセットに基づいて、特定のバトルスキルを評価するための一連のバトルベンチマークとパズルを確立した。さらに、ローカルゲームエンジンへの重要な更新を提供する。我々は、この研究がポケモンバトルをベンチマークとして活用し、LLM技術とゲーム理論的アルゴリズムを統合して一般的なマルチエージェント問題に対処するさらなる研究を促進することを期待している。動画、コード、データセットはhttps://sites.google.com/view/pokechamp-llmで公開されている。

English

We introduce Pok\'eChamp, a minimax agent powered by Large Language Models (LLMs) for Pok\'emon battles. Built on a general framework for two-player competitive games, Pok\'eChamp leverages the generalist capabilities of LLMs to enhance minimax tree search. Specifically, LLMs replace three key modules: (1) player action sampling, (2) opponent modeling, and (3) value function estimation, enabling the agent to effectively utilize gameplay history and human knowledge to reduce the search space and address partial observability. Notably, our framework requires no additional LLM training. We evaluate Pok\'eChamp in the popular Gen 9 OU format. When powered by GPT-4o, it achieves a win rate of 76% against the best existing LLM-based bot and 84% against the strongest rule-based bot, demonstrating its superior performance. Even with an open-source 8-billion-parameter Llama 3.1 model, Pok\'eChamp consistently outperforms the previous best LLM-based bot, Pok\'ellmon powered by GPT-4o, with a 64% win rate. Pok\'eChamp attains a projected Elo of 1300-1500 on the Pok\'emon Showdown online ladder, placing it among the top 30%-10% of human players. In addition, this work compiles the largest real-player Pok\'emon battle dataset, featuring over 3 million games, including more than 500k high-Elo matches. Based on this dataset, we establish a series of battle benchmarks and puzzles to evaluate specific battling skills. We further provide key updates to the local game engine. We hope this work fosters further research that leverage Pok\'emon battle as benchmark to integrate LLM technologies with game-theoretic algorithms addressing general multiagent problems. Videos, code, and dataset available at https://sites.google.com/view/pokechamp-llm.

PokéChamp: エキスパートレベルのミニマックス言語エージェント

PokéChamp: an Expert-level Minimax Language Agent

要旨

Support