DELLA-Merging:透過基於大小的抽樣減少模型合併中的干擾
DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling
June 17, 2024
作者: Pala Tej Deep, Rishabh Bhardwaj, Soujanya Poria
cs.AI
摘要
隨著特定領域模型的大量出現,模型合併已經成為一組技術,將多個模型的能力結合成一個可以多任務執行而無需額外訓練成本的模型。在本文中,我們提出了一種新的模型合併技術,稱為Drop and rEscaLe via sampLing with mAgnitude(DELLA-Merging),採用了一種新穎的修剪技術MAGPRUNE,相對於DARE和TIES,顯示出顯著的優勢。MAGPRUNE首先按照其大小對參數進行排序,並將較小大小對應的較低排名的參數分配較高的輸出概率(p)。為了逼近原始嵌入,MAGPRUNE通過在存活的參數上進行1/(1 - p)的隨機丟棄來執行重新縮放操作。在考慮合併的三個不同專家模型(LM、Math、Code)和相應的基準數據集(AlpacaEval、GSM8K、MBPP)上,DELLA相對於採用增量參數修剪的基準方法平均提高了2.4個點(比TIES提高了3.6個點,比DARE提高了1.2個點),並且比無修剪基準(TA)提高了11.1個點。我們在以下網址釋出源代碼:https://github.com/declare-lab/della。
English
With the proliferation of domain-specific models, model merging has emerged
as a set of techniques that combine the capabilities of multiple models into
one that can multitask without the cost of additional training. In this paper,
we propose a new model merging technique, Drop and rEscaLe via sampLing with
mAgnitude (DELLA-Merging), that employs a novel pruning technique, MAGPRUNE,
which shows significant advantages over DARE and TIES. MAGPRUNE first ranks the
parameters in order of their magnitude and assigns higher dropout probabilities
(p) to parameters with lower ranks corresponding to lower magnitudes. To
approximate the original embeddings, MAGPRUNE employs a rescaling operation on
the parameters that survive the random dropping by 1/(1 - p). On three
different expert models considered for merging (LM, Math, Code) and
corresponding benchmark datasets (AlpacaEval, GSM8K, MBPP), DELLA shows an
average improvement of 2.4 points over baseline methods employing delta
parameter pruning (an improvement of 3.6 points over TIES, 1.2 points over
DARE), and 11.1 points over the no-pruning baseline (TA). We release the source
code at: https://github.com/declare-lab/della.Summary
AI-Generated Summary