去中心化指令微調:衝突感知分割與權重合併
Decentralized Instruction Tuning: Conflict-Aware Splitting and Weight Merging
June 1, 2026
作者: Minsik Choi, Geewook Kim
cs.AI
摘要
指令微調能將大型語言模型(包括多模態模型)對齊至多樣化的使用者意圖,但當擴展至異質混合任務時,會受到梯度干擾與高頻寬同步的阻礙。我們探討是否能透過獨立訓練混合任務中的部分子集,並在參數空間中一次性合併,來同時解決這兩個瓶頸。我們在共享的平坦盆地內發展出一套局部二次理論,得出三項結果:權重合併產生曲率加權的變異數縮減;PCA對齊的衝突分割沿高曲率方向最大化此增益;且合併同時扮演帶隱式範數正則化的譜濾波。這些結果直接催生了MERIT——一個去中心化、可合併的指令微調流程,其先估計資料集層級的梯度衝突,再沿頂層PCA衝突軸劃分混合任務,各劃分子集獨立微調且無須跨子集通訊,最後以權重加權平均進行一次性合併。在搭載136項Vision-FLAN任務的Qwen2.5-VL-3B上,MERIT將8項基準平均分從54.3(聯合訓練)提升至57.0。相同的方案可擴展至7B模型與176個來源、160萬筆範例的混合任務,以極低的成本開銷達到或超越集中式聯合訓練的表現,並能遷移至純文字FLAN。我們的程式碼已公開於 https://github.com/naver-ai/merit。
English
Instruction tuning aligns large language models, including multimodal ones, with diverse user intents, but scaling to heterogeneous mixtures is hindered by gradient interference and bandwidth-heavy synchronization. We ask whether these two bottlenecks can be addressed jointly by training parts of the mixture independently and reconciling them once in parameter space. We develop a local quadratic theory inside a shared flat basin that yields three results: weight merging produces a curvature-weighted variance reduction; PCA-aligned conflict splitting maximizes this gain along high-curvature directions; and merging additionally acts as spectral filtering with implicit norm regularization. These results directly motivate MERIT, a decentralized merge-ready instruction-tuning pipeline that estimates dataset-level gradient conflicts, partitions the mixture along the top PCA conflict axes, fine-tunes each partition independently with no inter-partition communication, and merges once via token-weighted averaging. On Qwen2.5-VL-3B with 136 Vision-FLAN tasks, MERIT improves the 8-benchmark average from 54.3 (joint training) to 57.0. The same recipe scales to a 7B model on a 1.6M-example, 176-source mixture -- matching or exceeding centralized joint training with minimal cost overhead -- and transfers to text-only FLAN. Our code is available at https://github.com/naver-ai/merit.