分散型指示チューニング：競合を考慮した分割と重みマージ

要旨

インストラクションチューニングは、マルチモーダルモデルを含む大規模言語モデルを多様なユーザ意図に整合させるが、不均質な混合データへのスケーリングは勾配干渉と帯域幅を消費する同期によって妨げられる。本稿では、混合データの一部を独立に訓練し、パラメータ空間で一度だけそれらを調停することで、これら二つのボトルネックを同時に対処できるかを問う。共有された平坦な盆地内での局所二次理論を展開し、以下の三つの結果を得る：重みマージは曲率重み付き分散減少をもたらす；PCAに基づく競合分割は、高曲率方向に沿ってこの利得を最大化する；さらにマージは、暗黙のノルム正則化を伴うスペクトルフィルタリングとして機能する。これらの結果は、データセットレベルの勾配競合を推定し、上位PCA競合軸に沿って混合データを分割し、各分割を独立に（分割間通信なしで）ファインチューニングし、トークン重み付き平均化により一度だけマージする分散型マージ対応インストラクションチューニングパイプラインMERITを直接動機づける。136のVision-FLANタスクを持つQwen2.5-VL-3Bにおいて、MERITは8ベンチマーク平均を54.3（統合訓練）から57.0に向上させる。同じ手法は、176ソースからなる160万サンプルの混合データを用いた7Bモデルにもスケールし、最小限のコストオーバーヘッドで集中型統合訓練に匹敵またはそれを上回り、またテキスト専用FLANにも転用可能である。コードはhttps://github.com/naver-ai/meritで公開している。

English

Instruction tuning aligns large language models, including multimodal ones, with diverse user intents, but scaling to heterogeneous mixtures is hindered by gradient interference and bandwidth-heavy synchronization. We ask whether these two bottlenecks can be addressed jointly by training parts of the mixture independently and reconciling them once in parameter space. We develop a local quadratic theory inside a shared flat basin that yields three results: weight merging produces a curvature-weighted variance reduction; PCA-aligned conflict splitting maximizes this gain along high-curvature directions; and merging additionally acts as spectral filtering with implicit norm regularization. These results directly motivate MERIT, a decentralized merge-ready instruction-tuning pipeline that estimates dataset-level gradient conflicts, partitions the mixture along the top PCA conflict axes, fine-tunes each partition independently with no inter-partition communication, and merges once via token-weighted averaging. On Qwen2.5-VL-3B with 136 Vision-FLAN tasks, MERIT improves the 8-benchmark average from 54.3 (joint training) to 57.0. The same recipe scales to a 7B model on a 1.6M-example, 176-source mixture -- matching or exceeding centralized joint training with minimal cost overhead -- and transfers to text-only FLAN. Our code is available at https://github.com/naver-ai/merit.