随着规模扩大进行调整:针对高效计算训练的超参数优化
Tune As You Scale: Hyperparameter Optimization For Compute Efficient Training
June 13, 2023
作者: Abraham J. Fetterman, Ellie Kitanidis, Joshua Albrecht, Zachary Polizzi, Bryden Fogelman, Maksis Knutins, Bartosz Wróblewski, James B. Simon, Kanjun Qiu
cs.AI
摘要
深度学习模型的超参数调整可以使性能提升一个数量级,而计算量保持不变。尽管如此,系统化调整并不常见,特别是对于昂贵且具有许多超参数的大型模型,这需要对权衡、预算和搜索范围做出困难的判断。为了解决这些问题并提出一种稳健调整大型模型的实用方法,我们提出了成本感知帕累托区贝叶斯搜索(CARBS),这是一种贝叶斯优化算法,可在性能-成本帕累托边界周围执行局部搜索。CARBS即使在具有许多超参数的无界搜索空间中也能表现出色,学习缩放关系,使其能够调整模型,即使在模型被扩展的情况下,也能自动化许多调整中的“黑魔法”。在我们的结果中,我们通过调整简单的基线(如原始ProcGen论文中提供的PPO)有效地解决了整个ProcGen基准测试。我们还复现了Chinchilla项目(Hoffmann等,2022)中的模型大小与训练标记的比例结果,同时通过一种简单的自动化过程发现了其他每个超参数的比例定律,这个过程使用的计算量明显较少,并且适用于任何深度学习问题(不仅限于语言模型)。
English
Hyperparameter tuning of deep learning models can lead to order-of-magnitude
performance gains for the same amount of compute. Despite this, systematic
tuning is uncommon, particularly for large models, which are expensive to
evaluate and tend to have many hyperparameters, necessitating difficult
judgment calls about tradeoffs, budgets, and search bounds. To address these
issues and propose a practical method for robustly tuning large models, we
present Cost-Aware Pareto Region Bayesian Search (CARBS), a Bayesian
optimization algorithm that performs local search around the performance-cost
Pareto frontier. CARBS does well even in unbounded search spaces with many
hyperparameters, learns scaling relationships so that it can tune models even
as they are scaled up, and automates much of the "black magic" of tuning. Among
our results, we effectively solve the entire ProcGen benchmark just by tuning a
simple baseline (PPO, as provided in the original ProcGen paper). We also
reproduce the model size vs. training tokens scaling result from the Chinchilla
project (Hoffmann et al. 2022), while simultaneously discovering scaling laws
for every other hyperparameter, via an easy automated process that uses
significantly less compute and is applicable to any deep learning problem (not
just language models).