訪問集合至關重要：用於可擴展權重空間模型合併的專家讀取預算

摘要

权重空间模型合并通常被形式化为检查点上的代数操作，然而在大语言模型（LLM）规模下，限制性资源往往是必须读取的专家权重集。我们提出MergePipe——一种预算感知执行层，它将LLM合并视为专家访问集问题：在给定合并算子和共享权重坐标系中的检查点族的情况下，在显式I/O预算约束下选择需要访问的专家增量块。MergePipe对参数块建立索引，构建确定性访问计划，并通过可重放的清单执行由此产生的预算感知合并。该计划天生具有预算可靠性，并在完整预算下恢复全量读取合并；对于固定系数的加法算子，省略更新的误差由被省略增量块的范数界定。在Qwen和Llama的合并工作负载中，MergePipe将专家读取I/O减少高达一个数量级，并实现高达11倍的加速。代表性预算扫描显示，与全量读取合并相比，参数偏差为O(10^{-3})量级，且下游基准测试未出现单调退化。

English

Weight-space model merging is usually formulated as an algebraic operation on checkpoints, yet at LLM scale the limiting resource is often the set of expert weights that must be read. We introduce MergePipe, a budget-aware execution layer that casts LLM merging as an expert access-set problem: given a merge operator and a checkpoint family in a shared weight coordinate system, choose which expert delta blocks to access under an explicit I/O budget. MergePipe indexes parameter blocks, builds deterministic access plans, and executes the induced budgeted merge with replayable manifests. The plan is budget-sound by construction and recovers the full-read merge at full budget; for fixed-coefficient additive operators, the omitted-update error is bounded by the norm of omitted deltas. Across Qwen and Llama merging workloads, MergePipe reduces expert-read I/O by up to an order of magnitude and achieves up to 11times speedups. Representative budget sweeps show O(10^{-3}) parameter deviation from full-read merges and no monotonic degradation on downstream benchmarks.