DivMerge: 다중 작업을 위한 발산 기반 모델 병합 방법

초록

멀티태스크 학습(MTL)은 종종 파인튜닝 전에 데이터셋을 병합하는 방식으로 이루어지지만, 파인튜닝된 모델의 증가로 인해 태스크 산술을 통한 모델 병합과 같은 새로운 접근법이 등장했습니다. 이러한 환경에서 주요한 과제는 태스크 간 간섭으로, 태스크 수가 증가함에 따라 더 악화됩니다. 우리는 서로 다른 태스크에 대해 훈련된 모델들을 단일 모델로 병합하면서 모든 태스크에서 강력한 성능을 유지하는 방법을 제안합니다. 우리의 접근법은 추가 레이블 데이터 없이도 병합 과정을 안내하기 위해 젠센-샤논 발산을 활용하며, 태스크 중요도를 자동으로 조정합니다. 기존 방법과 달리, 우리의 접근법은 태스크 수가 증가해도 견고하게 작동하며, 이전 연구를 일관적으로 능가합니다.

English

Multi-task learning (MTL) is often achieved by merging datasets before fine-tuning, but the growing availability of fine-tuned models has led to new approaches such as model merging via task arithmetic. A major challenge in this setting is task interference, which worsens as the number of tasks increases. We propose a method that merges models trained on different tasks into a single model, maintaining strong performance across all tasks. Our approach leverages Jensen-Shannon divergence to guide the merging process without requiring additional labelled data, and automatically balances task importance. Unlike existing methods, our approach remains robust as the number of tasks grows and consistently outperforms prior work.

DivMerge: 다중 작업을 위한 발산 기반 모델 병합 방법

DivMerge: A divergence-based model merging method for multi-tasking

초록

Support