不留下任何任務:具有共同和特定任務子空間的各向同性模型合併
No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces
February 7, 2025
作者: Daniel Marczak, Simone Magistri, Sebastian Cygert, Bartłomiej Twardowski, Andrew D. Bagdanov, Joost van de Weijer
cs.AI
摘要
模型合併將多個任務特定模型的權重整合到一個多任務模型中。儘管近來對這個問題產生了興趣,但合併模型和單任務模型之間仍存在顯著的性能差距。在本文中,我們研究了任務矩陣的關鍵特徵,即應用於預訓練模型的權重更新矩陣,這些特徵有助於有效地進行合併。我們展示了任務特定和合併矩陣的單一成分之間的對齊與性能改善與預訓練模型之間的強烈相關性。基於此,我們提出了一個等向合併框架,該框架可以使任務矩陣的奇異值譜扁平化,增強對齊,並減少性能差距。此外,我們還將通用和任務特定子空間納入,以進一步提高對齊和性能。我們提出的方法在多種情境下實現了最先進的性能,包括各種任務集和模型規模。這項工作推進了對模型合併動態的理解,提供了一種有效的合併模型方法,無需額外的訓練。程式碼可在 https://github.com/danielm1405/iso-merging 找到。
English
Model merging integrates the weights of multiple task-specific models into a
single multi-task model. Despite recent interest in the problem, a significant
performance gap between the combined and single-task models remains. In this
paper, we investigate the key characteristics of task matrices -- weight update
matrices applied to a pre-trained model -- that enable effective merging. We
show that alignment between singular components of task-specific and merged
matrices strongly correlates with performance improvement over the pre-trained
model. Based on this, we propose an isotropic merging framework that flattens
the singular value spectrum of task matrices, enhances alignment, and reduces
the performance gap. Additionally, we incorporate both common and task-specific
subspaces to further improve alignment and performance. Our proposed approach
achieves state-of-the-art performance across multiple scenarios, including
various sets of tasks and model scales. This work advances the understanding of
model merging dynamics, offering an effective methodology to merge models
without requiring additional training. Code is available at
https://github.com/danielm1405/iso-merging .Summary
AI-Generated Summary