LightReasoner:小型語言模型能否教導大型語言模型進行推理?
LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?
October 9, 2025
作者: Jingyuan Wang, Yankai Chen, Zhonghang Li, Chao Huang
cs.AI
摘要
大型語言模型(LLMs)在推理能力上展現了顯著進步,這通常得益於監督式微調(SFT)。然而,SFT資源消耗巨大,依賴於大量精心策劃的數據集、拒絕採樣的示範以及對所有令牌的統一優化,儘管其中僅有少部分承載著有意義的學習價值。在本研究中,我們探討了一個反直覺的想法:較小的語言模型(SLMs)能否通過揭示反映後者獨特優勢的高價值推理時刻來教導更大的語言模型(LLMs)?我們提出了LightReasoner,這是一個新穎的框架,它利用更強的專家模型(LLM)與較弱的業餘模型(SLM)之間的行為差異。LightReasoner分兩個階段運作:(1)採樣階段,精確定位關鍵推理時刻,並通過專家與業餘者的對比構建捕捉專家優勢的監督示例;(2)微調階段,使專家模型與這些精煉示例對齊,從而放大其推理優勢。在七個數學基準測試中,LightReasoner將準確率提升了最高達28.1%,同時減少了90%的時間消耗、80%的採樣問題以及99%的微調令牌使用,且無需依賴真實標籤。通過將較弱的SLMs轉化為有效的教學信號,LightReasoner為提升LLM推理能力提供了一種可擴展且資源高效的方法。代碼可於以下網址獲取:https://github.com/HKUDS/LightReasoner
English
Large language models (LLMs) have demonstrated remarkable progress in
reasoning, often through supervised fine-tuning (SFT). However, SFT is
resource-intensive, relying on large curated datasets, rejection-sampled
demonstrations, and uniform optimization across all tokens, even though only a
fraction carry meaningful learning value. In this work, we explore a
counterintuitive idea: can smaller language models (SLMs) teach larger language
models (LLMs) by revealing high-value reasoning moments that reflect the
latter's unique strength? We propose LightReasoner, a novel framework that
leverages the behavioral divergence between a stronger expert model (LLM) and a
weaker amateur model (SLM). LightReasoner operates in two stages: (1) a
sampling stage that pinpoints critical reasoning moments and constructs
supervision examples capturing the expert's advantage through expert-amateur
contrast, and (2) a fine-tuning stage that aligns the expert model with these
distilled examples, amplifying its reasoning strengths. Across seven
mathematical benchmarks, LightReasoner improves accuracy by up to 28.1%, while
reducing time consumption by 90%, sampled problems by 80%, and tuned token
usage by 99%, all without relying on ground-truth labels. By turning weaker
SLMs into effective teaching signals, LightReasoner offers a scalable and
resource-efficient approach for advancing LLM reasoning. Code is available at:
https://github.com/HKUDS/LightReasoner