为何自蒸馏有时会削弱大语言模型的推理能力？

摘要

自蒸馏已成为大语言模型一种有效的后训练范式，通常能在缩短推理轨迹的同时提升性能。然而在数学推理任务中，我们发现该方法虽能缩减响应长度，却可能导致性能下降。通过溯源分析，我们将这种性能衰退归因于认知性言语表达的抑制——即模型在推理过程中不确定性表达能力的减弱。通过控制条件上下文丰富度与任务覆盖范围的实验表明：让教师模型基于丰富信息进行条件化会压制不确定性表达，这虽能通过有限任务覆盖实现快速的域内优化，却会损害分布外性能——因为面对未见问题时，模型本可通过表达不确定性并相应调整来获得更好表现。在Qwen3-8B、DeepSeek-Distill-Qwen-7B和Olmo3-7B-Instruct上的实验显示，性能降幅最高可达40%。我们的研究结果表明：暴露适当程度的不确定性对实现稳健推理至关重要，同时强调优化推理行为不应仅局限于强化正确答案轨迹，更需关注推理过程的内在机制。

English

Self-distillation has emerged as an effective post-training paradigm for LLMs, often improving performance while shortening reasoning traces. However, in mathematical reasoning, we find that it can reduce response length while degrading performance. We trace this degradation to the suppression of epistemic verbalization - the model's expression of uncertainty during reasoning. Through controlled experiments varying conditioning context richness and task coverage, we show that conditioning the teacher on rich information suppresses uncertainty expression, enabling rapid in-domain optimization with limited task coverage but harming OOD performance, where unseen problems benefit from expressing uncertainty and adjusting accordingly. Across Qwen3-8B, DeepSeek-Distill-Qwen-7B, and Olmo3-7B-Instruct, we observe performance drops of up to 40%. Our findings highlight that exposing appropriate levels of uncertainty is crucial for robust reasoning and underscore the importance of optimizing reasoning behavior beyond merely reinforcing correct answer traces.