寶可夢冠軍：專家級極小化極大語言代理

摘要

我們介紹了Pok\'eChamp，這是一款由大型語言模型（LLMs）驅動的極小極大代理，專為寶可夢對戰設計。基於一個適用於雙人競技遊戲的通用框架，Pok\'eChamp利用LLMs的通用能力來增強極小極大樹搜索。具體而言，LLMs取代了三個關鍵模塊：（1）玩家動作採樣，（2）對手建模，以及（3）價值函數估計，使代理能夠有效利用遊戲歷史和人類知識來縮小搜索空間並應對部分可觀測性。值得注意的是，我們的框架無需額外的LLM訓練。我們在流行的Gen 9 OU格式中評估了Pok\'eChamp。當使用GPT-4o驅動時，它對抗現有最佳基於LLM的機器人取得了76%的勝率，對抗最強的基於規則的機器人則達到了84%的勝率，展示了其卓越的性能。即使使用開源的80億參數Llama 3.1模型，Pok\'eChamp也持續超越之前最佳的基於LLM的機器人——由GPT-4o驅動的Pok\'ellmon，取得了64%的勝率。Pok\'eChamp在Pok\'emon Showdown在線梯隊中預計的Elo評分為1300-1500，使其躋身於人類玩家的前30%-10%。此外，這項工作編譯了最大的真實玩家寶可夢對戰數據集，包含超過300萬場對戰，其中包括超過50萬場高Elo對戰。基於此數據集，我們建立了一系列對戰基準和謎題，以評估特定的對戰技能。我們還提供了本地遊戲引擎的關鍵更新。我們希望這項工作能促進進一步研究，利用寶可夢對戰作為基準，將LLM技術與解決通用多智能體問題的博弈論算法相結合。視頻、代碼和數據集可在https://sites.google.com/view/pokechamp-llm獲取。

English

We introduce Pok\'eChamp, a minimax agent powered by Large Language Models (LLMs) for Pok\'emon battles. Built on a general framework for two-player competitive games, Pok\'eChamp leverages the generalist capabilities of LLMs to enhance minimax tree search. Specifically, LLMs replace three key modules: (1) player action sampling, (2) opponent modeling, and (3) value function estimation, enabling the agent to effectively utilize gameplay history and human knowledge to reduce the search space and address partial observability. Notably, our framework requires no additional LLM training. We evaluate Pok\'eChamp in the popular Gen 9 OU format. When powered by GPT-4o, it achieves a win rate of 76% against the best existing LLM-based bot and 84% against the strongest rule-based bot, demonstrating its superior performance. Even with an open-source 8-billion-parameter Llama 3.1 model, Pok\'eChamp consistently outperforms the previous best LLM-based bot, Pok\'ellmon powered by GPT-4o, with a 64% win rate. Pok\'eChamp attains a projected Elo of 1300-1500 on the Pok\'emon Showdown online ladder, placing it among the top 30%-10% of human players. In addition, this work compiles the largest real-player Pok\'emon battle dataset, featuring over 3 million games, including more than 500k high-Elo matches. Based on this dataset, we establish a series of battle benchmarks and puzzles to evaluate specific battling skills. We further provide key updates to the local game engine. We hope this work fosters further research that leverage Pok\'emon battle as benchmark to integrate LLM technologies with game-theoretic algorithms addressing general multiagent problems. Videos, code, and dataset available at https://sites.google.com/view/pokechamp-llm.

寶可夢冠軍：專家級極小化極大語言代理

PokéChamp: an Expert-level Minimax Language Agent

摘要

Support