ファインチューニング転移による効率的なモデル開発

要旨

現代の大規模言語モデル（LLM）は効率的な更新に苦戦しており、新しい事前学習モデルのバージョンごとに高コストなアライメントプロセスを繰り返す必要があります。この課題は、ドメイン固有や言語固有のモデルにも当てはまり、専門的なデータに対するファインチューニングは、新しいベースモデルのリリースごとに再実行しなければなりません。本論文では、モデルバージョン間でのファインチューニング更新の転移について探求します。具体的には、あるソースモデルバージョンからファインチューニングによる重みの変化を表す差分ベクトルを導出し、それを別のターゲットバージョンのベースモデルに適用します。様々なオープンウェイトモデルバージョンでの実証評価を通じて、差分ベクトルの転移がターゲットベースモデルを大幅に改善し、しばしばファインチューニングされたモデルと同等の性能を達成できることを示します。例えば、Llama 3.0 8Bからのファインチューニング更新を再利用することで、追加のトレーニングなしでベースLlama 3.1 8Bに対してGPQAでの絶対精度が10.7%向上し、Llama 3.1 8B Instructを上回りました。多言語モデル開発の設定では、このアプローチが再トレーニングなしでターゲット言語タスクの性能を大幅に向上させ、Global MMLUにおいてマダガスカル語とトルコ語でそれぞれ4.7%と15.5%の絶対的な改善を達成しました。制御された実験から、ファインチューニング転移はソースモデルとターゲットモデルがパラメータ空間で線形的に接続されている場合に最も効果的であることが明らかになりました。さらに、ファインチューニング転移がさらなるファインチューニングのためのより強力で計算効率の良い出発点を提供することを示します。最後に、継続的なモデル開発のための反復的なリサイクル・ファインチューニングアプローチを提案し、効率と効果の両方を改善します。我々の研究結果は、ファインチューニング転移がモデル性能を維持しながらトレーニングコストを削減するための有効な戦略であることを示唆しています。

English

Modern LLMs struggle with efficient updates, as each new pretrained model version requires repeating expensive alignment processes. This challenge also applies to domain- or language-specific models, where fine-tuning on specialized data must be redone for every new base model release. In this paper, we explore the transfer of fine-tuning updates between model versions. Specifically, we derive the diff vector from one source model version, which represents the weight changes from fine-tuning, and apply it to the base model of a different target version. Through empirical evaluations on various open-weight model versions, we show that transferring diff vectors can significantly improve the target base model, often achieving performance comparable to its fine-tuned counterpart. For example, reusing the fine-tuning updates from Llama 3.0 8B leads to an absolute accuracy improvement of 10.7% on GPQA over the base Llama 3.1 8B without additional training, surpassing Llama 3.1 8B Instruct. In a multilingual model development setting, we show that this approach can significantly increase performance on target-language tasks without retraining, achieving an absolute improvement of 4.7% and 15.5% on Global MMLU for Malagasy and Turkish, respectively, compared to Llama 3.1 8B Instruct. Our controlled experiments reveal that fine-tuning transfer is most effective when the source and target models are linearly connected in the parameter space. Additionally, we demonstrate that fine-tuning transfer offers a stronger and more computationally efficient starting point for further fine-tuning. Finally, we propose an iterative recycling-then-finetuning approach for continuous model development, which improves both efficiency and effectiveness. Our findings suggest that fine-tuning transfer is a viable strategy to reduce training costs while maintaining model performance.

ファインチューニング転移による効率的なモデル開発

Efficient Model Development through Fine-tuning Transfer

要旨

Support