中介者:具有較少參數衝突和基於不確定性路由的記憶效率LLM合併
Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing
February 6, 2025
作者: Kunfeng Lai, Zhenheng Tang, Xinglin Pan, Peijie Dong, Xiang Liu, Haolan Chen, Li Shen, Bo Li, Xiaowen Chu
cs.AI
摘要
模型合併將在不同任務上微調的大型語言模型(LLMs)聚合成一個更強大的模型。然而,模型之間的參數衝突導致平均性能下降。模型路由解決了這個問題,通過在推斷期間選擇個別模型,但會帶來過多的存儲和計算成本,並且無法利用來自不同模型的共同知識。在這項工作中,我們觀察到不同層次展示了不同程度的參數衝突。基於這一洞察,我們對具有最小參數衝突的層進行平均,並使用一種新的任務級專家路由來處理具有顯著衝突的層。為了進一步降低存儲成本,受到任務算術稀疏性的啟發,我們將多個微調專家解耦為一個密集專家和幾個稀疏專家。考慮到分布之外的樣本,我們根據輸入數據的任務不確定性選擇並合併適當的專家。我們對具有不同參數規模的LLaMA和Qwen進行了廣泛實驗,並在現實世界的推理任務上進行了評估。結果表明,相對於現有方法,我們的方法始終實現了顯著的性能改進,同時需要更少的系統成本。
English
Model merging aggregates Large Language Models (LLMs) finetuned on different
tasks into a stronger one. However, parameter conflicts between models leads to
performance degradation in averaging. While model routing addresses this issue
by selecting individual models during inference, it imposes excessive storage
and compute costs, and fails to leverage the common knowledge from different
models. In this work, we observe that different layers exhibit varying levels
of parameter conflicts. Building on this insight, we average layers with
minimal parameter conflicts and use a novel task-level expert routing for
layers with significant conflicts. To further reduce storage costs, inspired by
task arithmetic sparsity, we decouple multiple fine-tuned experts into a dense
expert and several sparse experts. Considering the out-of-distribution samples,
we select and merge appropriate experts based on the task uncertainty of the
input data. We conduct extensive experiments on both LLaMA and Qwen with
varying parameter scales, and evaluate on real-world reasoning tasks. Results
demonstrate that our method consistently achieves significant performance
improvements while requiring less system cost compared to existing methods.Summary
AI-Generated Summary