効率的な指示ファインチューニングのためのニューラルネットワークを用いたデータ評価

要旨

影響関数はモデル訓練における重要な洞察を提供しますが、既存の手法は計算コストの高さと汎化性能の限界に悩まされています。特に、最近の研究では言語モデルを用いてデータの影響を計算するための様々な指標やアルゴリズムが提案されていますが、大規模なモデルやデータセットに対してはスケールしません。これは、計算に必要な高コストな順伝播と逆伝播、大規模モデルを保存するための膨大なメモリ要件、そして新しいデータに対する影響推定の汎化性能の低さによるものです。本論文では、影響値を推定するために小さなニューラルネットワーク（InfluenceNetworkと呼びます）の使用を探求し、最大99%のコスト削減を達成しました。我々の評価では、フルサイズの言語モデル（7Bおよび8Bバージョンを使用）のわずか0.0027%のサイズのモデルで影響値を推定できることを示しています。我々は、影響値を推定するアルゴリズム（NN-CIFT: Neural Networks for effiCient Instruction Fine-Tuningと呼びます）を、一般的な指示微調整のためのサブセット選択という下流タスクに適用しました。本研究では、4つの最先端の影響関数を含め、NN-CIFTと元の影響関数の間に性能の妥協がないことを示しています。我々はNN-CIFTの詳細なハイパーパラメータ分析を提供します。我々の手法のコードはこちらで見つけることができます: https://github.com/agarwalishika/NN-CIFT。

English

Influence functions provide crucial insights into model training, but existing methods suffer from large computational costs and limited generalization. Particularly, recent works have proposed various metrics and algorithms to calculate the influence of data using language models, which do not scale well with large models and datasets. This is because of the expensive forward and backward passes required for computation, substantial memory requirements to store large models, and poor generalization of influence estimates to new data. In this paper, we explore the use of small neural networks -- which we refer to as the InfluenceNetwork -- to estimate influence values, achieving up to 99% cost reduction. Our evaluation demonstrates that influence values can be estimated with models just 0.0027% the size of full language models (we use 7B and 8B versions). We apply our algorithm of estimating influence values (called NN-CIFT: Neural Networks for effiCient Instruction Fine-Tuning) to the downstream task of subset selection for general instruction fine-tuning. In our study, we include four state-of-the-art influence functions and show no compromise in performance, despite large speedups, between NN-CIFT and the original influence functions. We provide an in-depth hyperparameter analyses of NN-CIFT. The code for our method can be found here: https://github.com/agarwalishika/NN-CIFT.

効率的な指示ファインチューニングのためのニューラルネットワークを用いたデータ評価

Data Valuation using Neural Networks for Efficient Instruction Fine-Tuning

要旨

Support