用於反蒸餾的推理軌跡之信息保全重構

摘要

近期大型語言模型（LLMs）的進展顯示，延長推理鏈的長度能顯著提升複雜任務的表現。雖然揭示這些推理軌跡有助於使用者更好地跟隨、驗證並從模型的解題過程中學習，但這也使得它們極易受到未經授權的蒸餾攻擊。為減輕此風險，專有模型提供商常採取激進的保護策略，例如以簡短摘要取代詳細推理，這剝奪了使用者獲取有價值的中間資訊的機會。為解決這一權衡問題，我們提出了PART，一種保留資訊的反蒸餾推理軌跡重構方法。基於人類理解推理軌跡與LLMs利用它們進行監督微調之間的差異，我們設計了一個簡單但有效的兩步重構：移除自對話行為並重新排序子結論。一個小型輔助模型被訓練來執行此重構，僅帶來最小的計算開銷。大量實驗表明，PART在不同規模和類型的學生模型上，在多種推理基準測試中持續破壞蒸餾效果。例如，當在重構後的軌跡上訓練時，即使是大型32B學生模型在AIME 2024上的表現也從54.17降至46.88，相當於13.5%的性能下降。

English

Recent advances in Large Language Models (LLMs) show that extending the length of reasoning chains significantly improves performance on complex tasks. While revealing these reasoning traces helps users better follow, verify, and learn from the model's problem-solving process, it also makes them highly vulnerable to unauthorized distillation. To mitigate this risk, proprietary model providers often adopt aggressive protection strategies, such as replacing detailed reasoning with brief summaries, which deprive users of valuable intermediate information. To address this trade-off, we propose PART, an information-preserving antidistillation reformulation of reasoning traces. Motivated by the difference between how humans understand reasoning traces and how LLMs exploit them for supervised fine-tuning, we design a simple but effective two-step reformulation: removing self-talk behaviors and reordering sub-conclusions. A small auxiliary model is trained to perform this reformulation, incurring minimal computational overhead. Extensive experiments demonstrate that PART consistently disrupts distillation across student models of different sizes and types on various reasoning benchmarks. For instance, when training on reformulated traces, even the performance of a large 32B student model decreases from 54.17 to 46.88 on AIME 2024, corresponding to a 13.5% degradation.

用於反蒸餾的推理軌跡之信息保全重構

Information-Preserving Reformulation of Reasoning Traces for Antidistillation

摘要

Support