アクセスセットの重要性：スケーラブルな重み空間モデルマージのためのエキスパート読み取りの予算化

要旨

重み空間モデルマージは通常、チェックポイントに対する代数的操作として定式化されるが、LLM規模では制限リソースは多くの場合、読み取りが必要なエキスパート重みの集合である。我々はMergePipeを提案する。これは、LLMマージをエキスパートアクセス集合問題として捉える予算認識実行層であり、共有重み座標系におけるマージ演算子とチェックポイントファミリが与えられたとき、明示的なI/O予算の下でどのエキスパート差分ブロックにアクセスするかを選択する。MergePipeはパラメータブロックをインデックス化し、決定論的アクセス計画を構築し、再生可能なマニフェストを用いて誘導された予算制約マージを実行する。この計画は構築により予算整合的であり、全予算では全読み込みマージを再現する。固定係数加算演算子の場合、省略更新誤差は省略された差分のノルムによって制限される。QwenおよびLlamaのマージワークロードにおいて、MergePipeはエキスパート読み込みI/Oを最大1桁削減し、最大11倍の高速化を達成する。代表的な予算スイープでは、全読み込みマージからのパラメータ偏差がO(10^{-3})であり、下流ベンチマークでの単調劣化は見られない。

English

Weight-space model merging is usually formulated as an algebraic operation on checkpoints, yet at LLM scale the limiting resource is often the set of expert weights that must be read. We introduce MergePipe, a budget-aware execution layer that casts LLM merging as an expert access-set problem: given a merge operator and a checkpoint family in a shared weight coordinate system, choose which expert delta blocks to access under an explicit I/O budget. MergePipe indexes parameter blocks, builds deterministic access plans, and executes the induced budgeted merge with replayable manifests. The plan is budget-sound by construction and recovers the full-read merge at full budget; for fixed-coefficient additive operators, the omitted-update error is bounded by the norm of omitted deltas. Across Qwen and Llama merging workloads, MergePipe reduces expert-read I/O by up to an order of magnitude and achieves up to 11times speedups. Representative budget sweeps show O(10^{-3}) parameter deviation from full-read merges and no monotonic degradation on downstream benchmarks.