信息保全型推理轨迹重构用于反蒸馏

摘要

近期大型语言模型（LLMs）的研究进展表明，扩展推理链的长度能显著提升复杂任务的表现。虽然展示这些推理轨迹有助于用户更好地跟踪、验证并学习模型的解题过程，但也使其极易遭受未经授权的知识蒸馏。为降低这一风险，专有模型提供商常采取激进的保护策略，如用简短的摘要替代详细的推理过程，这剥夺了用户获取宝贵中间信息的机会。为解决这一权衡问题，我们提出了PART，一种信息保全的反蒸馏推理轨迹重构方法。基于人类理解推理轨迹与LLMs利用其进行监督微调之间的差异，我们设计了一种简单但有效的两步重构策略：去除自我对话行为并重新排列子结论。一个辅助小模型被训练来执行这一重构，仅带来极小的计算开销。大量实验证明，PART在不同规模和类型的学生模型上，针对多种推理基准测试，均能持续干扰蒸馏效果。例如，当使用重构后的轨迹进行训练时，即便是大型32B学生模型在AIME 2024上的表现也从54.17降至46.88，相当于性能下降了13.5%。

English

Recent advances in Large Language Models (LLMs) show that extending the length of reasoning chains significantly improves performance on complex tasks. While revealing these reasoning traces helps users better follow, verify, and learn from the model's problem-solving process, it also makes them highly vulnerable to unauthorized distillation. To mitigate this risk, proprietary model providers often adopt aggressive protection strategies, such as replacing detailed reasoning with brief summaries, which deprive users of valuable intermediate information. To address this trade-off, we propose PART, an information-preserving antidistillation reformulation of reasoning traces. Motivated by the difference between how humans understand reasoning traces and how LLMs exploit them for supervised fine-tuning, we design a simple but effective two-step reformulation: removing self-talk behaviors and reordering sub-conclusions. A small auxiliary model is trained to perform this reformulation, incurring minimal computational overhead. Extensive experiments demonstrate that PART consistently disrupts distillation across student models of different sizes and types on various reasoning benchmarks. For instance, when training on reformulated traces, even the performance of a large 32B student model decreases from 54.17 to 46.88 on AIME 2024, corresponding to a 13.5% degradation.

信息保全型推理轨迹重构用于反蒸馏

Information-Preserving Reformulation of Reasoning Traces for Antidistillation

摘要

Support