探索なしでグランドマスターレベルのチェス

要旨

近年の機械学習における画期的な成功は、主にスケールに起因している。具体的には、大規模なアテンションベースのアーキテクチャと前例のない規模のデータセットがその要因である。本論文では、チェスにおける大規模トレーニングの影響を調査する。複雑なヒューリスティックや明示的な探索、またはその両方に依存する従来のチェスエンジンとは異なり、我々は1000万局のチェスゲームのデータセットを用いて、2億7000万パラメータのトランスフォーマーモデルを教師あり学習でトレーニングした。データセット内の各盤面には、強力なStockfish 16エンジンによって提供されるアクション値を注釈し、約150億のデータポイントを生成した。我々の最大のモデルは、人間相手にLichess blitz Eloで2895を達成し、ドメイン固有の調整や明示的な探索アルゴリズムなしに、一連の難しいチェスのパズルを成功裏に解決した。また、我々のモデルがAlphaZeroのポリシーおよびバリューネットワーク（MCTSなし）およびGPT-3.5-turbo-instructを上回ることを示す。モデルとデータセットのサイズに関する体系的な調査により、十分なスケールでのみ強力なチェス性能が発現することが明らかになった。結果を検証するため、設計選択とハイパーパラメータの広範なアブレーション実験を実施した。

English

The recent breakthrough successes in machine learning are mainly attributed to scale: namely large-scale attention-based architectures and datasets of unprecedented scale. This paper investigates the impact of training at scale for chess. Unlike traditional chess engines that rely on complex heuristics, explicit search, or a combination of both, we train a 270M parameter transformer model with supervised learning on a dataset of 10 million chess games. We annotate each board in the dataset with action-values provided by the powerful Stockfish 16 engine, leading to roughly 15 billion data points. Our largest model reaches a Lichess blitz Elo of 2895 against humans, and successfully solves a series of challenging chess puzzles, without any domain-specific tweaks or explicit search algorithms. We also show that our model outperforms AlphaZero's policy and value networks (without MCTS) and GPT-3.5-turbo-instruct. A systematic investigation of model and dataset size shows that strong chess performance only arises at sufficient scale. To validate our results, we perform an extensive series of ablations of design choices and hyperparameters.

探索なしでグランドマスターレベルのチェス

Grandmaster-Level Chess Without Search

要旨

Support