透過微調遷移實現高效的模型開發

摘要

现代大型语言模型（LLMs）在高效更新方面面临挑战，因为每个新的预训练模型版本都需要重复昂贵的对齐过程。这一挑战同样适用于领域或语言特定的模型，在这些模型中，针对新发布的基础模型，必须重新进行专门数据的微调。本文探讨了模型版本间微调更新的迁移问题。具体而言，我们从源模型版本中提取差异向量，该向量代表了微调带来的权重变化，并将其应用于不同目标版本的基础模型。通过对多个开源权重模型版本的实证评估，我们展示了迁移差异向量能够显著提升目标基础模型的性能，通常能达到与其微调版本相当的水平。例如，重用Llama 3.0 8B的微调更新，在GPQA任务上使基础Llama 3.1 8B的准确率绝对提升了10.7%，无需额外训练即超越了Llama 3.1 8B Instruct。在多语言模型开发场景中，我们证明了这种方法无需重新训练即可显著提升目标语言任务的性能，与Llama 3.1 8B Instruct相比，在Global MMLU上对马达加斯加语和土耳其语分别实现了4.7%和15.5%的绝对提升。我们的控制实验表明，当源模型和目标模型在参数空间中线性连接时，微调迁移最为有效。此外，我们展示了微调迁移为进一步微调提供了更强且计算效率更高的起点。最后，我们提出了一种迭代的“回收再微调”方法，用于持续模型开发，既提高了效率又增强了效果。我们的研究结果表明，微调迁移是一种可行的策略，能够在保持模型性能的同时降低训练成本。

English

Modern LLMs struggle with efficient updates, as each new pretrained model version requires repeating expensive alignment processes. This challenge also applies to domain- or language-specific models, where fine-tuning on specialized data must be redone for every new base model release. In this paper, we explore the transfer of fine-tuning updates between model versions. Specifically, we derive the diff vector from one source model version, which represents the weight changes from fine-tuning, and apply it to the base model of a different target version. Through empirical evaluations on various open-weight model versions, we show that transferring diff vectors can significantly improve the target base model, often achieving performance comparable to its fine-tuned counterpart. For example, reusing the fine-tuning updates from Llama 3.0 8B leads to an absolute accuracy improvement of 10.7% on GPQA over the base Llama 3.1 8B without additional training, surpassing Llama 3.1 8B Instruct. In a multilingual model development setting, we show that this approach can significantly increase performance on target-language tasks without retraining, achieving an absolute improvement of 4.7% and 15.5% on Global MMLU for Malagasy and Turkish, respectively, compared to Llama 3.1 8B Instruct. Our controlled experiments reveal that fine-tuning transfer is most effective when the source and target models are linearly connected in the parameter space. Additionally, we demonstrate that fine-tuning transfer offers a stronger and more computationally efficient starting point for further fine-tuning. Finally, we propose an iterative recycling-then-finetuning approach for continuous model development, which improves both efficiency and effectiveness. Our findings suggest that fine-tuning transfer is a viable strategy to reduce training costs while maintaining model performance.

透過微調遷移實現高效的模型開發

Efficient Model Development through Fine-tuning Transfer

摘要

Support