無需搜尋的國際象棋大師級別

摘要

機器學習最近取得的重大突破主要歸功於規模：即大規模基於注意力的架構和空前規模的數據集。本文探討了在國際象棋訓練規模對其影響。與依賴於複雜啟發式、明確搜索或兩者結合的傳統國際象棋引擎不同，我們使用監督學習在一個包含1千萬場國際象棋對局的數據集上訓練了一個擁有2.7億參數的Transformer模型。我們使用強大的Stockfish 16引擎為數據集中的每個棋盤標註動作值，共產生約150億數據點。我們最大的模型在Lichess閃電賽中以2895的Elo分數擊敗人類，並成功解決了一系列具有挑戰性的國際象棋謎題，而無需任何領域特定的調整或明確搜索算法。我們還展示了我們的模型優於AlphaZero的策略和價值網絡（無需MCTS）以及GPT-3.5-turbo-instruct。對模型和數據集大小的系統性研究表明，強大的國際象棋表現僅在足夠的規模下才會出現。為驗證我們的結果，我們進行了一系列對設計選擇和超參數的大量消融實驗。

English

The recent breakthrough successes in machine learning are mainly attributed to scale: namely large-scale attention-based architectures and datasets of unprecedented scale. This paper investigates the impact of training at scale for chess. Unlike traditional chess engines that rely on complex heuristics, explicit search, or a combination of both, we train a 270M parameter transformer model with supervised learning on a dataset of 10 million chess games. We annotate each board in the dataset with action-values provided by the powerful Stockfish 16 engine, leading to roughly 15 billion data points. Our largest model reaches a Lichess blitz Elo of 2895 against humans, and successfully solves a series of challenging chess puzzles, without any domain-specific tweaks or explicit search algorithms. We also show that our model outperforms AlphaZero's policy and value networks (without MCTS) and GPT-3.5-turbo-instruct. A systematic investigation of model and dataset size shows that strong chess performance only arises at sufficient scale. To validate our results, we perform an extensive series of ablations of design choices and hyperparameters.

無需搜尋的國際象棋大師級別

Grandmaster-Level Chess Without Search

摘要

Support