尴尬却简单的自蒸馏方法提升代码生成能力

摘要

大型语言模型（LLM）能否仅利用自身原始输出（无需验证器、教师模型或强化学习）来提升代码生成能力？我们通过简单自蒸馏（SSD）方法给出了肯定答案：以特定温度参数和截断配置从模型中采样生成解决方案，随后通过标准监督微调对这些样本进行训练。在LiveCodeBench v6基准测试中，SSD将Qwen3-30B-Instruct的pass@1准确率从42.4%提升至55.3%，且提升效果集中体现在高难度问题上。该方法在4B、8B和30B规模的Qwen与Llama系列模型（包括指导型和思维链变体）中均展现普适性。为探究这一简单方法有效的机理，我们追踪发现收益源于LLM解码过程中的精度-探索矛盾，并证明SSD能以上下文相关的方式重塑词元分布——在需要精确度的场景抑制干扰性的分布尾部，同时在需要探索性的场景保留有效多样性。综合来看，SSD为提升LLM代码生成能力提供了一条互补的后训练路径。

English

Can a large language model (LLM) improve at code generation using only its own raw outputs, without a verifier, a teacher model, or reinforcement learning? We answer in the affirmative with simple self-distillation (SSD): sample solutions from the model with certain temperature and truncation configurations, then fine-tune on those samples with standard supervised fine-tuning. SSD improves Qwen3-30B-Instruct from 42.4% to 55.3% pass@1 on LiveCodeBench v6, with gains concentrating on harder problems, and it generalizes across Qwen and Llama models at 4B, 8B, and 30B scale, including both instruct and thinking variants. To understand why such a simple method can work, we trace these gains to a precision-exploration conflict in LLM decoding and show that SSD reshapes token distributions in a context-dependent way, suppressing distractor tails where precision matters while preserving useful diversity where exploration matters. Taken together, SSD offers a complementary post-training direction for improving LLM code generation.