전이 학습을 통한 효율적인 모델 개발

초록

현대의 대형 언어 모델(LLM)은 효율적인 업데이트에 어려움을 겪고 있으며, 새로운 사전 학습 모델 버전마다 비용이 많이 드는 정렬 과정을 반복해야 합니다. 이러한 문제는 특정 도메인이나 언어에 특화된 모델에도 적용되며, 새로운 기본 모델이 출시될 때마다 특수 데이터에 대한 미세 조정을 다시 수행해야 합니다. 본 논문에서는 모델 버전 간 미세 조정 업데이트의 전이를 탐구합니다. 구체적으로, 우리는 미세 조정으로 인한 가중치 변화를 나타내는 diff 벡터를 하나의 소스 모델 버전에서 도출하고, 이를 다른 대상 버전의 기본 모델에 적용합니다. 다양한 오픈 가중치 모델 버전에 대한 실험적 평가를 통해, diff 벡터를 전이함으로써 대상 기본 모델을 크게 개선할 수 있으며, 종종 미세 조정된 모델과 비슷한 성능을 달성할 수 있음을 보여줍니다. 예를 들어, Llama 3.0 8B의 미세 조정 업데이트를 재사용하면 GPQA에서 추가 학습 없이도 기본 Llama 3.1 8B보다 10.7%의 절대 정확도 향상을 달성하며, Llama 3.1 8B Instruct를 능가합니다. 다국어 모델 개발 환경에서, 이 접근법은 재학습 없이도 대상 언어 작업에서 성능을 크게 향상시킬 수 있으며, Global MMLU에서 말라가시어와 터키어에 대해 각각 4.7%와 15.5%의 절대적 개선을 달성합니다. 우리의 통제된 실험은 소스 모델과 대상 모델이 매개변수 공간에서 선형적으로 연결될 때 미세 조정 전이가 가장 효과적임을 보여줍니다. 또한, 미세 조정 전이는 추가 미세 조정을 위한 더 강력하고 계산적으로 효율적인 시작점을 제공함을 입증합니다. 마지막으로, 우리는 지속적인 모델 개발을 위한 반복적인 재활용 후 미세 조정 접근법을 제안하며, 이는 효율성과 효과성을 모두 개선합니다. 우리의 연구 결과는 미세 조정 전이가 모델 성능을 유지하면서도 학습 비용을 줄이는 실행 가능한 전략임을 시사합니다.

English

Modern LLMs struggle with efficient updates, as each new pretrained model version requires repeating expensive alignment processes. This challenge also applies to domain- or language-specific models, where fine-tuning on specialized data must be redone for every new base model release. In this paper, we explore the transfer of fine-tuning updates between model versions. Specifically, we derive the diff vector from one source model version, which represents the weight changes from fine-tuning, and apply it to the base model of a different target version. Through empirical evaluations on various open-weight model versions, we show that transferring diff vectors can significantly improve the target base model, often achieving performance comparable to its fine-tuned counterpart. For example, reusing the fine-tuning updates from Llama 3.0 8B leads to an absolute accuracy improvement of 10.7% on GPQA over the base Llama 3.1 8B without additional training, surpassing Llama 3.1 8B Instruct. In a multilingual model development setting, we show that this approach can significantly increase performance on target-language tasks without retraining, achieving an absolute improvement of 4.7% and 15.5% on Global MMLU for Malagasy and Turkish, respectively, compared to Llama 3.1 8B Instruct. Our controlled experiments reveal that fine-tuning transfer is most effective when the source and target models are linearly connected in the parameter space. Additionally, we demonstrate that fine-tuning transfer offers a stronger and more computationally efficient starting point for further fine-tuning. Finally, we propose an iterative recycling-then-finetuning approach for continuous model development, which improves both efficiency and effectiveness. Our findings suggest that fine-tuning transfer is a viable strategy to reduce training costs while maintaining model performance.

전이 학습을 통한 효율적인 모델 개발

Efficient Model Development through Fine-tuning Transfer

초록

Support