基於神經網絡的數據估值實現高效指令微調

摘要

影響函數為模型訓練提供了關鍵洞察，但現有方法存在計算成本高和泛化能力有限的問題。特別是，近期研究提出了多種利用語言模型計算數據影響的指標和算法，這些方法在處理大型模型和數據集時難以擴展。這是由於計算過程中需要進行昂貴的前向和反向傳播、存儲大型模型所需的大量內存，以及影響估計對新數據的泛化能力較差。本文探討了使用小型神經網絡——我們稱之為影響網絡（InfluenceNetwork）——來估計影響值，實現了高達99%的成本降低。我們的評估表明，僅需使用完整語言模型（我們使用7B和8B版本）0.0027%大小的模型即可估計影響值。我們將估計影響值的算法（稱為NN-CIFT：用於高效指令微調的神經網絡）應用於通用指令微調的子集選擇下游任務。在研究中，我們納入了四種最先進的影響函數，並展示了NN-CIFT與原始影響函數在性能上無妥協，儘管速度大幅提升。我們提供了NN-CIFT的深入超參數分析。我們方法的代碼可在此處找到：https://github.com/agarwalishika/NN-CIFT。

English

Influence functions provide crucial insights into model training, but existing methods suffer from large computational costs and limited generalization. Particularly, recent works have proposed various metrics and algorithms to calculate the influence of data using language models, which do not scale well with large models and datasets. This is because of the expensive forward and backward passes required for computation, substantial memory requirements to store large models, and poor generalization of influence estimates to new data. In this paper, we explore the use of small neural networks -- which we refer to as the InfluenceNetwork -- to estimate influence values, achieving up to 99% cost reduction. Our evaluation demonstrates that influence values can be estimated with models just 0.0027% the size of full language models (we use 7B and 8B versions). We apply our algorithm of estimating influence values (called NN-CIFT: Neural Networks for effiCient Instruction Fine-Tuning) to the downstream task of subset selection for general instruction fine-tuning. In our study, we include four state-of-the-art influence functions and show no compromise in performance, despite large speedups, between NN-CIFT and the original influence functions. We provide an in-depth hyperparameter analyses of NN-CIFT. The code for our method can be found here: https://github.com/agarwalishika/NN-CIFT.

基於神經網絡的數據估值實現高效指令微調

Data Valuation using Neural Networks for Efficient Instruction Fine-Tuning

摘要

Support