透過模型融合實現高效長短文本LLM推理

摘要

從系統1到系統2推理的轉變，在大型語言模型（LLMs）中標誌著通過深思熟慮、迭代思考來處理複雜任務的重大進步。然而，這種進步往往以效率為代價，因為模型傾向於過度思考，生成冗餘的推理步驟，而輸出質量的提升卻不成比例。長到短（L2S）推理作為應對這一挑戰的有前景的解決方案應運而生，旨在平衡推理深度與實際效率。儘管現有方法，如監督微調（SFT）、強化學習（RL）和提示工程，已顯示出潛力，但它們要么計算成本高昂，要么不穩定。相比之下，模型合併提供了一種經濟高效且穩健的替代方案，通過整合系統1模型的快速思考能力與系統2模型的有條不紊的推理。在本研究中，我們對L2S推理的模型合併進行了全面的實證研究，探索了多種方法，包括基於任務向量的、基於SVD的以及基於激活信息的合併。我們的實驗表明，模型合併可以將平均回應長度減少高達55%，同時保持甚至提升基準性能。我們還通過對1.5B/7B/14B/32B模型的廣泛評估，發現了模型規模與合併效果之間的強烈相關性。此外，我們研究了合併模型自我批判和自我修正的能力，以及其根據任務複雜度自適應調整回應長度的特性。我們的研究結果強調，模型合併作為L2S推理的一種高效且有效的範式，提供了一個解決過度思考問題的實用方案，同時保持了系統2推理的穩健性。本工作可在Github上找到：https://github.com/hahahawu/Long-to-Short-via-Model-Merging。

English

The transition from System 1 to System 2 reasoning in large language models (LLMs) has marked significant advancements in handling complex tasks through deliberate, iterative thinking. However, this progress often comes at the cost of efficiency, as models tend to overthink, generating redundant reasoning steps without proportional improvements in output quality. Long-to-Short (L2S) reasoning has emerged as a promising solution to this challenge, aiming to balance reasoning depth with practical efficiency. While existing approaches, such as supervised fine-tuning (SFT), reinforcement learning (RL), and prompt engineering, have shown potential, they are either computationally expensive or unstable. Model merging, on the other hand, offers a cost-effective and robust alternative by integrating the quick-thinking capabilities of System 1 models with the methodical reasoning of System 2 models. In this work, we present a comprehensive empirical study on model merging for L2S reasoning, exploring diverse methodologies, including task-vector-based, SVD-based, and activation-informed merging. Our experiments reveal that model merging can reduce average response length by up to 55% while preserving or even improving baseline performance. We also identify a strong correlation between model scale and merging efficacy with extensive evaluations on 1.5B/7B/14B/32B models. Furthermore, we investigate the merged model's ability to self-critique and self-correct, as well as its adaptive response length based on task complexity. Our findings highlight model merging as a highly efficient and effective paradigm for L2S reasoning, offering a practical solution to the overthinking problem while maintaining the robustness of System 2 reasoning. This work can be found on Github https://github.com/hahahawu/Long-to-Short-via-Model-Merging.

透過模型融合實現高效長短文本LLM推理

Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging

摘要

Support