ChatPaper.aiChatPaper

LightReasoner:小语言模型能否教会大语言模型进行推理?

LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?

October 9, 2025
作者: Jingyuan Wang, Yankai Chen, Zhonghang Li, Chao Huang
cs.AI

摘要

大型语言模型(LLMs)在推理能力上展现了显著进步,这通常得益于监督微调(SFT)。然而,SFT过程资源消耗巨大,依赖于大规模精心策划的数据集、拒绝采样的示范样本以及对所有标记的均匀优化,尽管其中仅有少数标记承载着实质性的学习价值。本研究探讨了一个反直觉的设想:较小的语言模型(SLMs)能否通过揭示反映LLMs独特优势的高价值推理时刻,来指导LLMs的学习?我们提出了LightReasoner,一个创新框架,它利用强专家模型(LLM)与弱业余模型(SLM)之间的行为差异。LightReasoner分两阶段运作:(1)采样阶段,精准定位关键推理时刻,并通过专家与业余模型的对比构建监督示例,捕捉专家的优势;(2)微调阶段,使专家模型与这些提炼出的示例对齐,从而放大其推理强项。在七个数学基准测试中,LightReasoner将准确率最高提升了28.1%,同时减少了90%的时间消耗、80%的采样问题以及99%的微调标记使用,且无需依赖真实标签。通过将较弱的SLMs转化为有效的教学信号,LightReasoner为提升LLM推理能力提供了一种可扩展且资源高效的方法。代码已公开于:https://github.com/HKUDS/LightReasoner
English
Large language models (LLMs) have demonstrated remarkable progress in reasoning, often through supervised fine-tuning (SFT). However, SFT is resource-intensive, relying on large curated datasets, rejection-sampled demonstrations, and uniform optimization across all tokens, even though only a fraction carry meaningful learning value. In this work, we explore a counterintuitive idea: can smaller language models (SLMs) teach larger language models (LLMs) by revealing high-value reasoning moments that reflect the latter's unique strength? We propose LightReasoner, a novel framework that leverages the behavioral divergence between a stronger expert model (LLM) and a weaker amateur model (SLM). LightReasoner operates in two stages: (1) a sampling stage that pinpoints critical reasoning moments and constructs supervision examples capturing the expert's advantage through expert-amateur contrast, and (2) a fine-tuning stage that aligns the expert model with these distilled examples, amplifying its reasoning strengths. Across seven mathematical benchmarks, LightReasoner improves accuracy by up to 28.1%, while reducing time consumption by 90%, sampled problems by 80%, and tuned token usage by 99%, all without relying on ground-truth labels. By turning weaker SLMs into effective teaching signals, LightReasoner offers a scalable and resource-efficient approach for advancing LLM reasoning. Code is available at: https://github.com/HKUDS/LightReasoner
PDF82October 13, 2025