通过模型融合实现高效的长短文本LLM推理

摘要

从系统1到系统2推理的转变，在大语言模型（LLMs）中标志着通过深思熟虑、迭代思维处理复杂任务的重大进步。然而，这一进展往往以效率为代价，因为模型倾向于过度思考，生成冗余的推理步骤，而输出质量的提升并不成比例。长到短（L2S）推理作为一种有前景的解决方案应运而生，旨在平衡推理深度与实际效率。尽管现有方法，如监督微调（SFT）、强化学习（RL）和提示工程，已显示出潜力，但它们要么计算成本高昂，要么稳定性不足。相比之下，模型合并提供了一种成本效益高且稳健的替代方案，通过整合系统1模型的快速思维能力和系统2模型的有条理推理。在本研究中，我们对L2S推理的模型合并进行了全面的实证研究，探索了多种方法，包括基于任务向量的、基于奇异值分解（SVD）的和基于激活信息的合并。我们的实验表明，模型合并能够将平均响应长度减少高达55%，同时保持甚至提升基线性能。我们还通过对1.5B/7B/14B/32B模型的广泛评估，发现了模型规模与合并效果之间的强相关性。此外，我们研究了合并模型自我批判和自我修正的能力，以及其根据任务复杂性自适应调整响应长度的特性。我们的发现强调了模型合并作为L2S推理的一种高效且有效的范式，为解决过度思考问题提供了实用方案，同时保持了系统2推理的稳健性。本工作可在Github上找到：https://github.com/hahahawu/Long-to-Short-via-Model-Merging。

English

The transition from System 1 to System 2 reasoning in large language models (LLMs) has marked significant advancements in handling complex tasks through deliberate, iterative thinking. However, this progress often comes at the cost of efficiency, as models tend to overthink, generating redundant reasoning steps without proportional improvements in output quality. Long-to-Short (L2S) reasoning has emerged as a promising solution to this challenge, aiming to balance reasoning depth with practical efficiency. While existing approaches, such as supervised fine-tuning (SFT), reinforcement learning (RL), and prompt engineering, have shown potential, they are either computationally expensive or unstable. Model merging, on the other hand, offers a cost-effective and robust alternative by integrating the quick-thinking capabilities of System 1 models with the methodical reasoning of System 2 models. In this work, we present a comprehensive empirical study on model merging for L2S reasoning, exploring diverse methodologies, including task-vector-based, SVD-based, and activation-informed merging. Our experiments reveal that model merging can reduce average response length by up to 55% while preserving or even improving baseline performance. We also identify a strong correlation between model scale and merging efficacy with extensive evaluations on 1.5B/7B/14B/32B models. Furthermore, we investigate the merged model's ability to self-critique and self-correct, as well as its adaptive response length based on task complexity. Our findings highlight model merging as a highly efficient and effective paradigm for L2S reasoning, offering a practical solution to the overthinking problem while maintaining the robustness of System 2 reasoning. This work can be found on Github https://github.com/hahahawu/Long-to-Short-via-Model-Merging.

通过模型融合实现高效的长短文本LLM推理

Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging

摘要

Support