通过微调迁移实现高效模型开发

摘要

现代大型语言模型（LLMs）在高效更新方面面临挑战，因为每个新预训练模型版本都需要重复昂贵的对齐过程。这一挑战同样适用于特定领域或语言的模型，其中针对专业数据的微调必须在每次新基础模型发布时重新进行。本文探讨了模型版本间微调更新的迁移。具体而言，我们从一个源模型版本中提取差异向量，该向量代表微调带来的权重变化，并将其应用于不同目标版本的基础模型。通过对多个开源权重模型版本的实证评估，我们展示了迁移差异向量能显著提升目标基础模型，通常能达到与其微调版本相当的性能。例如，重用Llama 3.0 8B的微调更新，在无需额外训练的情况下，使GPQA上的绝对准确率比基础Llama 3.1 8B提高了10.7%，超越了Llama 3.1 8B Instruct。在多语言模型开发场景中，我们展示了该方法无需重新训练即可显著提升目标语言任务的表现，与Llama 3.1 8B Instruct相比，在Global MMLU上对马达加斯加语和土耳其语分别实现了4.7%和15.5%的绝对提升。我们的控制实验表明，当源模型和目标模型在参数空间中线性连接时，微调迁移最为有效。此外，我们证明了微调迁移为后续微调提供了一个更强且计算效率更高的起点。最后，我们提出了一种迭代的“回收-再微调”方法，用于持续模型开发，既提高了效率又增强了效果。我们的研究结果表明，微调迁移是一种可行的策略，能在保持模型性能的同时降低训练成本。

English

Modern LLMs struggle with efficient updates, as each new pretrained model version requires repeating expensive alignment processes. This challenge also applies to domain- or language-specific models, where fine-tuning on specialized data must be redone for every new base model release. In this paper, we explore the transfer of fine-tuning updates between model versions. Specifically, we derive the diff vector from one source model version, which represents the weight changes from fine-tuning, and apply it to the base model of a different target version. Through empirical evaluations on various open-weight model versions, we show that transferring diff vectors can significantly improve the target base model, often achieving performance comparable to its fine-tuned counterpart. For example, reusing the fine-tuning updates from Llama 3.0 8B leads to an absolute accuracy improvement of 10.7% on GPQA over the base Llama 3.1 8B without additional training, surpassing Llama 3.1 8B Instruct. In a multilingual model development setting, we show that this approach can significantly increase performance on target-language tasks without retraining, achieving an absolute improvement of 4.7% and 15.5% on Global MMLU for Malagasy and Turkish, respectively, compared to Llama 3.1 8B Instruct. Our controlled experiments reveal that fine-tuning transfer is most effective when the source and target models are linearly connected in the parameter space. Additionally, we demonstrate that fine-tuning transfer offers a stronger and more computationally efficient starting point for further fine-tuning. Finally, we propose an iterative recycling-then-finetuning approach for continuous model development, which improves both efficiency and effectiveness. Our findings suggest that fine-tuning transfer is a viable strategy to reduce training costs while maintaining model performance.

通过微调迁移实现高效模型开发

Efficient Model Development through Fine-tuning Transfer

摘要

Support