DELLA-Merging：通过基于幅度的抽样减少模型合并中的干扰

摘要

随着领域特定模型的大量出现，模型合并作为一组技术已经出现，它将多个模型的功能结合成一个可以多任务处理而无需额外训练成本的模型。在本文中，我们提出了一种新的模型合并技术，名为通过采样进行丢弃和重新缩放的DELTA-Merging（DELLA-Merging），它采用了一种新颖的修剪技术MAGPRUNE，相比DARE和TIES显示出显著优势。MAGPRUNE首先按照参数的大小顺序对参数进行排名，并为排名较低（对应较小幅度）的参数分配更高的丢失概率（p）。为了近似原始嵌入，MAGPRUNE通过在幸存的参数上进行重新缩放操作，缩放比例为1/(1 - p)。在考虑用于合并的三个不同专家模型（LM、Math、Code）和相应的基准数据集（AlpacaEval、GSM8K、MBPP）上，DELLA相比使用增量参数修剪的基准方法平均提高了2.4个点（比TIES提高了3.6个点，比DARE提高了1.2个点），相比不进行修剪的基准线（TA）提高了11.1个点。我们在以下网址发布源代码：https://github.com/declare-lab/della。

English

With the proliferation of domain-specific models, model merging has emerged as a set of techniques that combine the capabilities of multiple models into one that can multitask without the cost of additional training. In this paper, we propose a new model merging technique, Drop and rEscaLe via sampLing with mAgnitude (DELLA-Merging), that employs a novel pruning technique, MAGPRUNE, which shows significant advantages over DARE and TIES. MAGPRUNE first ranks the parameters in order of their magnitude and assigns higher dropout probabilities (p) to parameters with lower ranks corresponding to lower magnitudes. To approximate the original embeddings, MAGPRUNE employs a rescaling operation on the parameters that survive the random dropping by 1/(1 - p). On three different expert models considered for merging (LM, Math, Code) and corresponding benchmark datasets (AlpacaEval, GSM8K, MBPP), DELLA shows an average improvement of 2.4 points over baseline methods employing delta parameter pruning (an improvement of 3.6 points over TIES, 1.2 points over DARE), and 11.1 points over the no-pruning baseline (TA). We release the source code at: https://github.com/declare-lab/della.

DELLA-Merging：通过基于幅度的抽样减少模型合并中的干扰

DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling

摘要

Support