ChatPaper.aiChatPaper

重探模型插值以實現高效推理

Revisiting Model Interpolation for Efficient Reasoning

October 13, 2025
作者: Taiqiang Wu, Runming Yang, Tao Liu, Jiahao Wang, Ngai Wong
cs.AI

摘要

模型融合,尤其是在指令型和思考型模型上的應用,已展現出卓越的推理效率。本文中,我們系統性地重新審視了最簡單的模型融合方法——直接對兩個模型的權重進行插值。特別地,我們觀察到模型插值遵循一個三階段的演化模式,在推理軌跡上呈現出不同的行為特徵。這些動態特性為權衡性能與成本提供了原則性的指導。實驗結果表明,策略性地進行插值的模型在效率和效果上意外地超越了複雜的模型融合基線。我們進一步通過對模型層次、模組和解碼策略的廣泛消融研究驗證了這些發現。最終,本研究揭開了模型插值的神秘面紗,並提供了一個實用框架,用於精準打造具有特定推理能力的模型。相關代碼已開源於https://github.com/wutaiqiang/MI{Github}。
English
Model merging, typically on Instruct and Thinking models, has shown remarkable performance for efficient reasoning. In this paper, we systematically revisit the simplest merging method that interpolates two weights directly. Particularly, we observe that model interpolation follows a three-stage evolutionary paradigm with distinct behaviors on the reasoning trajectory. These dynamics provide a principled guide for navigating the performance-cost trade-off. Empirical results demonstrate that a strategically interpolated model surprisingly surpasses sophisticated model merging baselines on both efficiency and effectiveness. We further validate our findings with extensive ablation studies on model layers, modules, and decoding strategies. Ultimately, this work demystifies model interpolation and offers a practical framework for crafting models with precisely targeted reasoning capabilities. Code is available at https://github.com/wutaiqiang/MI{Github}.
PDF86October 16, 2025