모델 병합을 통한 효율적인 장기-단기 LLM 추론의 개방

초록

대규모 언어 모델(LLMs)에서 시스템 1에서 시스템 2 사고로의 전환은 신중하고 반복적인 사고를 통해 복잡한 작업을 처리하는 데 있어 상당한 진전을 이루었습니다. 그러나 이러한 진전은 종종 효율성을 희생시키며, 모델들이 과도하게 사고하여 출력 품질의 비례적 개선 없이 불필요한 추론 단계를 생성하는 경향이 있습니다. Long-to-Short(L2S) 추론은 이러한 문제를 해결하기 위한 유망한 솔루션으로, 추론 깊이와 실용적 효율성 사이의 균형을 맞추는 것을 목표로 합니다. 기존의 접근 방식인 지도 미세 조정(SFT), 강화 학습(RL), 프롬프트 엔지니어링 등은 잠재력을 보여주었지만, 계산 비용이 많이 들거나 불안정한 단점이 있습니다. 반면, 모델 병합은 시스템 1 모델의 빠른 사고 능력과 시스템 2 모델의 체계적인 추론을 통합함으로써 비용 효율적이고 견고한 대안을 제공합니다. 본 연구에서는 L2S 추론을 위한 모델 병합에 대한 포괄적인 실증 연구를 제시하며, 작업 벡터 기반, SVD 기반, 활성화 정보 기반 병합 등 다양한 방법론을 탐구합니다. 실험 결과, 모델 병합은 평균 응답 길이를 최대 55%까지 줄이면서도 기준 성능을 유지하거나 오히려 개선할 수 있음을 보여줍니다. 또한, 1.5B/7B/14B/32B 모델에 대한 광범위한 평가를 통해 모델 규모와 병합 효율성 사이의 강한 상관관계를 확인했습니다. 더 나아가, 병합된 모델의 자기 비판 및 자기 수정 능력과 작업 복잡도에 따른 적응적 응답 길이를 조사했습니다. 본 연구의 결과는 L2S 추론을 위한 매우 효율적이고 효과적인 패러다임으로서 모델 병합을 강조하며, 시스템 2 추론의 견고성을 유지하면서 과도한 사고 문제에 대한 실용적인 해결책을 제공합니다. 이 연구는 Github https://github.com/hahahawu/Long-to-Short-via-Model-Merging에서 확인할 수 있습니다.

English

The transition from System 1 to System 2 reasoning in large language models (LLMs) has marked significant advancements in handling complex tasks through deliberate, iterative thinking. However, this progress often comes at the cost of efficiency, as models tend to overthink, generating redundant reasoning steps without proportional improvements in output quality. Long-to-Short (L2S) reasoning has emerged as a promising solution to this challenge, aiming to balance reasoning depth with practical efficiency. While existing approaches, such as supervised fine-tuning (SFT), reinforcement learning (RL), and prompt engineering, have shown potential, they are either computationally expensive or unstable. Model merging, on the other hand, offers a cost-effective and robust alternative by integrating the quick-thinking capabilities of System 1 models with the methodical reasoning of System 2 models. In this work, we present a comprehensive empirical study on model merging for L2S reasoning, exploring diverse methodologies, including task-vector-based, SVD-based, and activation-informed merging. Our experiments reveal that model merging can reduce average response length by up to 55% while preserving or even improving baseline performance. We also identify a strong correlation between model scale and merging efficacy with extensive evaluations on 1.5B/7B/14B/32B models. Furthermore, we investigate the merged model's ability to self-critique and self-correct, as well as its adaptive response length based on task complexity. Our findings highlight model merging as a highly efficient and effective paradigm for L2S reasoning, offering a practical solution to the overthinking problem while maintaining the robustness of System 2 reasoning. This work can be found on Github https://github.com/hahahawu/Long-to-Short-via-Model-Merging.

모델 병합을 통한 효율적인 장기-단기 LLM 추론의 개방

Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging

초록

Support