効率的な長文から短文へのLLM推論をモデル統合で実現

要旨

大規模言語モデル（LLM）におけるSystem 1からSystem 2推論への移行は、複雑なタスクを意図的かつ反復的な思考を通じて処理する能力の著しい進歩を示してきました。しかし、この進歩はしばしば効率性の低下を伴い、モデルが過剰に思考し、出力品質の比例的な向上なしに冗長な推論ステップを生成する傾向があります。Long-to-Short（L2S）推論は、この課題に対する有望な解決策として登場し、推論の深さと実用的な効率性のバランスを取ることを目指しています。既存のアプローチ、例えば教師ありファインチューニング（SFT）、強化学習（RL）、プロンプトエンジニアリングなどは、潜在的な可能性を示しているものの、計算コストが高いか不安定であるという問題があります。一方、モデルマージングは、System 1モデルの迅速な思考能力とSystem 2モデルの体系的な推論能力を統合することで、コスト効率が高く堅牢な代替手段を提供します。本研究では、L2S推論のためのモデルマージングに関する包括的な実証研究を提示し、タスクベクトルベース、SVDベース、活性化情報に基づくマージングなど、多様な方法論を探求します。実験結果から、モデルマージングにより平均応答長を最大55％削減しながら、ベースライン性能を維持または向上させることが可能であることが明らかになりました。また、1.5B/7B/14B/32Bモデルに対する広範な評価を通じて、モデル規模とマージング効果の間に強い相関関係があることを特定しました。さらに、マージされたモデルの自己批判および自己修正能力、およびタスクの複雑さに基づく適応的な応答長についても調査しました。本研究の結果は、モデルマージングがL2S推論のための非常に効率的かつ効果的なパラダイムであり、System 2推論の堅牢性を維持しながら過剰思考問題に対する実用的な解決策を提供することを強調しています。この研究はGithub（https://github.com/hahahawu/Long-to-Short-via-Model-Merging）で公開されています。

English

The transition from System 1 to System 2 reasoning in large language models (LLMs) has marked significant advancements in handling complex tasks through deliberate, iterative thinking. However, this progress often comes at the cost of efficiency, as models tend to overthink, generating redundant reasoning steps without proportional improvements in output quality. Long-to-Short (L2S) reasoning has emerged as a promising solution to this challenge, aiming to balance reasoning depth with practical efficiency. While existing approaches, such as supervised fine-tuning (SFT), reinforcement learning (RL), and prompt engineering, have shown potential, they are either computationally expensive or unstable. Model merging, on the other hand, offers a cost-effective and robust alternative by integrating the quick-thinking capabilities of System 1 models with the methodical reasoning of System 2 models. In this work, we present a comprehensive empirical study on model merging for L2S reasoning, exploring diverse methodologies, including task-vector-based, SVD-based, and activation-informed merging. Our experiments reveal that model merging can reduce average response length by up to 55% while preserving or even improving baseline performance. We also identify a strong correlation between model scale and merging efficacy with extensive evaluations on 1.5B/7B/14B/32B models. Furthermore, we investigate the merged model's ability to self-critique and self-correct, as well as its adaptive response length based on task complexity. Our findings highlight model merging as a highly efficient and effective paradigm for L2S reasoning, offering a practical solution to the overthinking problem while maintaining the robustness of System 2 reasoning. This work can be found on Github https://github.com/hahahawu/Long-to-Short-via-Model-Merging.

効率的な長文から短文へのLLM推論をモデル統合で実現

Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging

要旨

Support