无需搜索的国际象棋大师级别AI

摘要

机器学习最近取得的突破性成功主要归功于规模：即大规模基于注意力的架构和空前规模的数据集。本文研究了在国际象棋训练规模对性能的影响。与依赖复杂启发式、显式搜索或二者结合的传统国际象棋引擎不同，我们使用监督学习在一个包含1000万场国际象棋对局的数据集上训练了一个拥有2.7亿参数的Transformer模型。我们使用强大的Stockfish 16引擎提供的动作值对数据集中的每个棋盘进行了标注，共产生约150亿数据点。我们最大的模型在与人类的Lichess闪电赛中达到了2895的Elo评分，并成功解决了一系列具有挑战性的国际象棋难题，而无需任何领域特定的调整或显式搜索算法。我们还展示了我们的模型优于AlphaZero的策略和价值网络（不使用MCTS）以及GPT-3.5-turbo-instruct。对模型和数据集规模的系统调查表明，强大的国际象棋表现仅在足够大的规模下才会出现。为了验证我们的结果，我们进行了一系列关于设计选择和超参数的广泛消融实验。

English

The recent breakthrough successes in machine learning are mainly attributed to scale: namely large-scale attention-based architectures and datasets of unprecedented scale. This paper investigates the impact of training at scale for chess. Unlike traditional chess engines that rely on complex heuristics, explicit search, or a combination of both, we train a 270M parameter transformer model with supervised learning on a dataset of 10 million chess games. We annotate each board in the dataset with action-values provided by the powerful Stockfish 16 engine, leading to roughly 15 billion data points. Our largest model reaches a Lichess blitz Elo of 2895 against humans, and successfully solves a series of challenging chess puzzles, without any domain-specific tweaks or explicit search algorithms. We also show that our model outperforms AlphaZero's policy and value networks (without MCTS) and GPT-3.5-turbo-instruct. A systematic investigation of model and dataset size shows that strong chess performance only arises at sufficient scale. To validate our results, we perform an extensive series of ablations of design choices and hyperparameters.

无需搜索的国际象棋大师级别AI

Grandmaster-Level Chess Without Search

摘要

Support