탐색 없이 그랜드마스터 수준의 체스

초록

최근 머신러닝 분야에서의 획기적인 성공은 주로 규모의 확장에 기인한다. 즉, 대규모 어텐션 기반 아키텍처와 전례 없는 규모의 데이터셋이 그 원인이다. 본 논문은 체스에서의 대규모 훈련의 영향을 조사한다. 복잡한 휴리스틱, 명시적 탐색 또는 이 둘의 조합에 의존하는 전통적인 체스 엔진과 달리, 우리는 10백만 개의 체스 게임 데이터셋을 사용하여 2억 7천만 개의 파라미터를 가진 트랜스포머 모델을 지도 학습으로 훈련시켰다. 데이터셋의 각 보드는 강력한 Stockfish 16 엔진이 제공한 행동 가치로 주석 처리되었으며, 이는 약 150억 개의 데이터 포인트로 이어진다. 우리의 가장 큰 모델은 인간 상대와의 Lichess 블리츠 Elo에서 2895를 달성했으며, 도메인 특화적인 조정이나 명시적 탐색 알고리즘 없이도 일련의 도전적인 체스 퍼즐을 성공적으로 해결했다. 또한, 우리의 모델은 AlphaZero의 정책 및 가치 네트워크(MCTS 없이)와 GPT-3.5-turbo-instruct를 능가함을 보여준다. 모델 및 데이터셋 크기에 대한 체계적인 조사를 통해 강력한 체스 성능은 충분한 규모에서만 발생함을 확인했다. 결과를 검증하기 위해, 설계 선택과 하이퍼파라미터에 대한 광범위한 제거 실험을 수행했다.

English

The recent breakthrough successes in machine learning are mainly attributed to scale: namely large-scale attention-based architectures and datasets of unprecedented scale. This paper investigates the impact of training at scale for chess. Unlike traditional chess engines that rely on complex heuristics, explicit search, or a combination of both, we train a 270M parameter transformer model with supervised learning on a dataset of 10 million chess games. We annotate each board in the dataset with action-values provided by the powerful Stockfish 16 engine, leading to roughly 15 billion data points. Our largest model reaches a Lichess blitz Elo of 2895 against humans, and successfully solves a series of challenging chess puzzles, without any domain-specific tweaks or explicit search algorithms. We also show that our model outperforms AlphaZero's policy and value networks (without MCTS) and GPT-3.5-turbo-instruct. A systematic investigation of model and dataset size shows that strong chess performance only arises at sufficient scale. To validate our results, we perform an extensive series of ablations of design choices and hyperparameters.

탐색 없이 그랜드마스터 수준의 체스

Grandmaster-Level Chess Without Search

초록

Support