メディエーター：メモリ効率の良いLLMマージングにおいて、パラメーターの競合と不確実性を減らした経路選択

要旨

モデルのマージングは、異なるタスクでファインチューニングされた大規模言語モデル（LLM）をより強力なものに統合します。ただし、モデル間のパラメータの競合が平均化において性能の低下をもたらします。モデルのルーティングは、推論時に個々のモデルを選択することでこの問題に対処しますが、過剰なストレージおよび計算コストを発生させ、異なるモデルからの共通の知識を活用することができません。本研究では、異なるレイヤーが異なるレベルのパラメータの競合を示すことを観察しました。この洞察を基に、パラメータの競合が最小限のレイヤーを平均化し、重要な競合があるレイヤーには新しいタスクレベルの専門家ルーティングを使用します。さらに、ストレージコストを削減するために、タスク算術的疎な性質に着想を得て、複数のファインチューニングされた専門家を密な専門家といくつかの疎な専門家に分離します。分布外のサンプルを考慮して、入力データのタスクの不確実性に基づいて適切な専門家を選択しマージします。我々は、異なるパラメータスケールを持つLLaMAとQwenの両方で広範な実験を行い、実世界の推論タスクで評価しました。結果は、既存の手法と比較して、我々の手法が一貫して著しい性能向上を達成し、より少ないシステムコストを必要とすることを示しています。

English

Model merging aggregates Large Language Models (LLMs) finetuned on different tasks into a stronger one. However, parameter conflicts between models leads to performance degradation in averaging. While model routing addresses this issue by selecting individual models during inference, it imposes excessive storage and compute costs, and fails to leverage the common knowledge from different models. In this work, we observe that different layers exhibit varying levels of parameter conflicts. Building on this insight, we average layers with minimal parameter conflicts and use a novel task-level expert routing for layers with significant conflicts. To further reduce storage costs, inspired by task arithmetic sparsity, we decouple multiple fine-tuned experts into a dense expert and several sparse experts. Considering the out-of-distribution samples, we select and merge appropriate experts based on the task uncertainty of the input data. We conduct extensive experiments on both LLaMA and Qwen with varying parameter scales, and evaluate on real-world reasoning tasks. Results demonstrate that our method consistently achieves significant performance improvements while requiring less system cost compared to existing methods.

メディエーター：メモリ効率の良いLLMマージングにおいて、パラメーターの競合と不確実性を減らした経路選択

Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing

要旨

Support