フィネチューニングによる新たな能力の予測

要旨

現代のLLMスケーリングにおける基本的なオープンチャレンジは、新たな能力に関する理解の不足です。特に、言語モデルの事前トレーニング損失は、計算量の関数として非常に予測可能であることが知られています。しかし、下流の能力ははるかに予測しにくく、時には新たな飛躍さえも示すことがあり、これが将来のモデルの能力を予測することを難しくしています。本研究では、まず「出現予測」という課題を提起します。つまり、現在のランダムな少数ショットの精度を持つLLMにアクセスがある場合、将来のモデル（GPT-N+1）がそのタスクで非自明な精度を持つかどうかを予測できるでしょうか。次に、この問題に対する単純な洞察を見つけます。特定のタスクでLLMをファインチューニングすることで、出現が起こるスケーリングのポイントを、より能力の低いモデルにシフトさせることができます。この洞察を実用化するために、異なる量のデータでLLMをファインチューニングし、いつ出現が起こるかを予測するパラメトリックな関数を適合させることができます（つまり、「出現法則」）。私たちは、大規模なオープンソースのLLMが既に出現を示している4つの標準的なNLPベンチマーク（MMLU、GSM8K、CommonsenseQA、CoLA）を使用して、このアプローチを検証します。小規模なLLMのみを使用して、いくつかのケースでは、最大4倍の計算でトレーニングされたモデルが出現しているかどうかを正確に予測できることがわかります。最後に、出現予測の2つの現実的な用途のケーススタディを提示します。

English

A fundamental open challenge in modern LLM scaling is the lack of understanding around emergent capabilities. In particular, language model pretraining loss is known to be highly predictable as a function of compute. However, downstream capabilities are far less predictable -- sometimes even exhibiting emergent jumps -- which makes it challenging to anticipate the capabilities of future models. In this work, we first pose the task of emergence prediction: given access to current LLMs that have random few-shot accuracy on a task, can we predict whether future models (GPT-N+1) will have non-trivial accuracy on that task? We then discover a simple insight for this problem: finetuning LLMs on a given task can shift the point in scaling at which emergence occurs towards less capable models. To operationalize this insight, we can finetune LLMs with varying amounts of data and fit a parametric function that predicts when emergence will occur (i.e., "emergence laws"). We validate this approach using four standard NLP benchmarks where large-scale open-source LLMs already demonstrate emergence (MMLU, GSM8K, CommonsenseQA, and CoLA). Using only small-scale LLMs, we find that, in some cases, we can accurately predict whether models trained with up to 4x more compute have emerged. Finally, we present a case study of two realistic uses for emergence prediction.

フィネチューニングによる新たな能力の予測

Predicting Emergent Capabilities by Finetuning

要旨

Support