大型語言模型下游任務表現的規模定律

摘要

規模定律提供了重要見解，可指導大型語言模型（LLMs）的設計。現有研究主要集中在研究預訓練（上游）損失的規模定律。然而，在轉移學習設置中，LLMs通常在無監督數據集上預先訓練，然後在下游任務上進行微調，我們也關心下游表現。在這項工作中，我們研究轉移學習設置中的規模行為，其中LLMs被微調用於機器翻譯任務。具體來說，我們調查了預訓練數據的選擇及其大小如何影響下游表現（翻譯質量），評估標準為兩個指標：下游交叉熵和BLEU分數。我們的實驗表明，微調數據集的大小和預訓練數據與下游數據的分佈對規模行為有顯著影響。在充分對齊的情況下，隨著更多的預訓練數據，下游交叉熵和BLEU分數都會單調提高。在這種情況下，我們展示了可以使用對數定律準確預測下游BLEU分數的可能性。然而，也存在一些情況，中等不對齊導致BLEU分數隨著更多的預訓練而波動或變差，而下游交叉熵則單調改善。通過分析這些觀察結果，我們提供了選擇適當預訓練數據的新實用見解。

English

Scaling laws provide important insights that can guide the design of large language models (LLMs). Existing work has primarily focused on studying scaling laws for pretraining (upstream) loss. However, in transfer learning settings, in which LLMs are pretrained on an unsupervised dataset and then finetuned on a downstream task, we often also care about the downstream performance. In this work, we study the scaling behavior in a transfer learning setting, where LLMs are finetuned for machine translation tasks. Specifically, we investigate how the choice of the pretraining data and its size affect downstream performance (translation quality) as judged by two metrics: downstream cross-entropy and BLEU score. Our experiments indicate that the size of the finetuning dataset and the distribution alignment between the pretraining and downstream data significantly influence the scaling behavior. With sufficient alignment, both downstream cross-entropy and BLEU score improve monotonically with more pretraining data. In such cases, we show that it is possible to predict the downstream BLEU score with good accuracy using a log-law. However, there are also cases where moderate misalignment causes the BLEU score to fluctuate or get worse with more pretraining, whereas downstream cross-entropy monotonically improves. By analyzing these observations, we provide new practical insights for choosing appropriate pretraining data.

大型語言模型下游任務表現的規模定律

Scaling Laws for Downstream Task Performance of Large Language Models

摘要

Support