隨著規模擴展調整:針對高效運算的超參數優化訓練
Tune As You Scale: Hyperparameter Optimization For Compute Efficient Training
June 13, 2023
作者: Abraham J. Fetterman, Ellie Kitanidis, Joshua Albrecht, Zachary Polizzi, Bryden Fogelman, Maksis Knutins, Bartosz Wróblewski, James B. Simon, Kanjun Qiu
cs.AI
摘要
深度學習模型的超參數調整可以使性能提升一個數量級,而計算量卻相同。儘管如此,系統性調整並不常見,尤其對於昂貴且具有許多超參數的大型模型,這需要對權衡、預算和搜索範圍進行困難的判斷。為了應對這些問題並提出一種穩健調整大型模型的實用方法,我們提出了成本感知帕累托區域貝葉斯搜索(CARBS),這是一種貝葉斯優化算法,可在性能成本帕累托邊界周圍執行局部搜索。CARBS即使在具有許多超參數的無界搜索空間中也表現出色,學習縮放關係,以便在模型擴展時進行調整,並自動化許多調整中的「黑魔法」。在我們的研究結果中,我們通過調整一個簡單的基準線(如原始ProcGen論文中提供的PPO)有效解決了整個ProcGen基準測試。我們還從Chinchilla項目(Hoffmann等人,2022年)中重現了模型大小與訓練標記縮放結果,同時通過一個簡單的自動化過程發現了其他每個超參數的縮放定律,該過程使用的計算量明顯較少,並且適用於任何深度學習問題(不僅僅是語言模型)。
English
Hyperparameter tuning of deep learning models can lead to order-of-magnitude
performance gains for the same amount of compute. Despite this, systematic
tuning is uncommon, particularly for large models, which are expensive to
evaluate and tend to have many hyperparameters, necessitating difficult
judgment calls about tradeoffs, budgets, and search bounds. To address these
issues and propose a practical method for robustly tuning large models, we
present Cost-Aware Pareto Region Bayesian Search (CARBS), a Bayesian
optimization algorithm that performs local search around the performance-cost
Pareto frontier. CARBS does well even in unbounded search spaces with many
hyperparameters, learns scaling relationships so that it can tune models even
as they are scaled up, and automates much of the "black magic" of tuning. Among
our results, we effectively solve the entire ProcGen benchmark just by tuning a
simple baseline (PPO, as provided in the original ProcGen paper). We also
reproduce the model size vs. training tokens scaling result from the Chinchilla
project (Hoffmann et al. 2022), while simultaneously discovering scaling laws
for every other hyperparameter, via an easy automated process that uses
significantly less compute and is applicable to any deep learning problem (not
just language models).