신경망 학습 알고리즘 벤치마킹

초록

훈련 알고리즘은 넓은 의미에서 모든 딥러닝 파이프라인의 필수적인 부분입니다. 다양한 작업 부하에서 훈련 속도를 향상시키는 훈련 알고리즘 개선(예: 더 나은 업데이트 규칙, 튜닝 프로토콜, 학습률 스케줄링, 데이터 선택 방식 등)은 시간을 절약하고, 계산 자원을 절약하며, 더 나은 정확도의 모델을 이끌어낼 수 있습니다. 그러나 현재로서는 커뮤니티가 훈련 알고리즘 개선을 신뢰성 있게 식별하거나 최신 기술 수준의 훈련 알고리즘을 결정할 수 없는 상황입니다. 본 연구에서는 구체적인 실험을 통해, 훈련 속도 향상을 위한 진정한 진전은 훈련 알고리즘의 경험적 비교가 직면한 세 가지 기본적인 문제를 해결하는 새로운 벤치마크가 필요함을 주장합니다: (1) 훈련이 완료되었음을 결정하고 훈련 시간을 정확히 측정하는 방법, (2) 측정값이 작업 부하의 세부 사항에 민감하게 반응하는 문제를 처리하는 방법, (3) 하이퍼파라미터 튜닝이 필요한 알고리즘을 공정하게 비교하는 방법. 이러한 문제를 해결하기 위해, 고정된 하드웨어에서 여러 작업 부하를 실행하는 새로운 경쟁적 시간-결과 벤치마크인 'AlgoPerf: 훈련 알고리즘 벤치마크'를 소개합니다. 이 벤치마크는 작업 부하 변화에 대해 현재 널리 사용되는 방법보다 더 강건한 벤치마크 제출물을 감지할 수 있도록 하는 작업 부하 변형 세트를 포함합니다. 마지막으로, 현재 관행을 대표하는 다양한 최적화 기법을 사용하여 구성한 베이스라인 제출물과 최근 문헌에서 주목받은 다른 최적화 기법을 평가합니다. 이러한 베이스라인 결과는 벤치마크의 실현 가능성을 입증하고, 방법 간에 사소하지 않은 차이가 존재함을 보여주며, 향후 벤치마크 제출물이 시도하고 넘어설 수 있는 임시적인 최신 기술 수준을 설정합니다.

English

Training algorithms, broadly construed, are an essential part of every deep learning pipeline. Training algorithm improvements that speed up training across a wide variety of workloads (e.g., better update rules, tuning protocols, learning rate schedules, or data selection schemes) could save time, save computational resources, and lead to better, more accurate, models. Unfortunately, as a community, we are currently unable to reliably identify training algorithm improvements, or even determine the state-of-the-art training algorithm. In this work, using concrete experiments, we argue that real progress in speeding up training requires new benchmarks that resolve three basic challenges faced by empirical comparisons of training algorithms: (1) how to decide when training is complete and precisely measure training time, (2) how to handle the sensitivity of measurements to exact workload details, and (3) how to fairly compare algorithms that require hyperparameter tuning. In order to address these challenges, we introduce a new, competitive, time-to-result benchmark using multiple workloads running on fixed hardware, the AlgoPerf: Training Algorithms benchmark. Our benchmark includes a set of workload variants that make it possible to detect benchmark submissions that are more robust to workload changes than current widely-used methods. Finally, we evaluate baseline submissions constructed using various optimizers that represent current practice, as well as other optimizers that have recently received attention in the literature. These baseline results collectively demonstrate the feasibility of our benchmark, show that non-trivial gaps between methods exist, and set a provisional state-of-the-art for future benchmark submissions to try and surpass.

신경망 학습 알고리즘 벤치마킹

Benchmarking Neural Network Training Algorithms

초록

Support