접근 집합의 중요성: 확장 가능한 가중치 공간 모델 병합을 위한 예산 기반 전문가 판독

초록

가중치 공간 모델 병합은 일반적으로 체크포인트에 대한 대수적 연산으로 공식화되지만, LLM 규모에서는 읽어야 하는 전문가 가중치 집합이 제한 자원인 경우가 많다. 우리는 MergePipe를 도입하는데, 이는 예산 인식 실행 계층으로서 LLM 병합을 전문가 접근 집합 문제로 재구성한다: 공유 가중치 좌표계에서 병합 연산자와 체크포인트 패밀리가 주어졌을 때, 명시적 I/O 예산 하에 접근할 전문가 델타 블록을 선택한다. MergePipe는 파라미터 블록을 인덱싱하고, 결정론적 접근 계획을 수립하며, 재생 가능한 매니페스트를 통해 유도된 예산 기반 병합을 실행한다. 이 계획은 설계상 예산 건전성을 가지며, 전체 예산에서는 전체 읽기 병합을 복원한다. 고정 계수 가산 연산자의 경우, 생략된 업데이트 오차는 생략된 델타의 노름에 의해 제한된다. Qwen 및 Llama 병합 작업 전반에 걸쳐 MergePipe는 전문가 읽기 I/O를 최대 한 자릿수까지 줄이고 최대 11배의 속도 향상을 달성한다. 대표적인 예산 스윕에서는 전체 읽기 병합 대비 O(10^{-3}) 수준의 파라미터 편차를 보이며, 하위 벤치마크에서 단조로운 성능 저하가 나타나지 않는다.

English

Weight-space model merging is usually formulated as an algebraic operation on checkpoints, yet at LLM scale the limiting resource is often the set of expert weights that must be read. We introduce MergePipe, a budget-aware execution layer that casts LLM merging as an expert access-set problem: given a merge operator and a checkpoint family in a shared weight coordinate system, choose which expert delta blocks to access under an explicit I/O budget. MergePipe indexes parameter blocks, builds deterministic access plans, and executes the induced budgeted merge with replayable manifests. The plan is budget-sound by construction and recovers the full-read merge at full budget; for fixed-coefficient additive operators, the omitted-update error is bounded by the norm of omitted deltas. Across Qwen and Llama merging workloads, MergePipe reduces expert-read I/O by up to an order of magnitude and achieves up to 11times speedups. Representative budget sweeps show O(10^{-3}) parameter deviation from full-read merges and no monotonic degradation on downstream benchmarks.