确定性幻觉:解耦同策略蒸馏中的能力与校准
The Illusion of Certainty: Decoupling Capability and Calibration in On-Policy Distillation
April 18, 2026
作者: Jiaxin Zhang, Xiangyu Peng, Qinglin Chen, Qinyuan Ye, Caiming Xiong, Chien-Sheng Wu
cs.AI
摘要
在线策略蒸馏(OPD)已成为语言模型后训练中日益重要的范式。然而,我们发现了一种普遍存在的误校准缩放定律:虽然OPD能有效提升任务准确率,却会系统性地使模型陷入严重过度自信。通过理论分析,我们将此问题归因于信息错配:教师监督基于训练时可用的特权上下文构建,而部署模型仅能依据运行时信息报告置信度。研究证明,以教师为条件的成功率通常无法作为运行时置信度的有效目标,且有益的特权上下文会引发熵坍缩和系统性乐观偏差。为此,我们提出校准感知的OPD框架CaOPD:通过模型推演估计经验置信度,以学生模型为基础的实证目标替代自报告置信度,并通过同款自蒸馏流程重构响应。多模型多领域的实验表明,CaOPD在保持竞争力的同时实现了帕累托最优校准,在分布外场景和持续学习中均展现稳健泛化能力。本研究揭示能力蒸馏并不等同于校准置信度,强调置信度应作为后训练的核心优化目标。代码地址:https://github.com/SalesforceAIResearch/CaOPD
English
On-policy distillation (OPD) is an increasingly important paradigm for post-training language models. However, we identify a pervasive Scaling Law of Miscalibration: while OPD effectively improves task accuracy, it systematically traps models in severe overconfidence. We trace this failure to an information mismatch: teacher supervision is formed under privileged context available during training, whereas the deployed model must report confidence using only deployment-time information. We formalize this perspective theoretically, showing that teacher-conditioned success is generally not a valid target for deployment-time confidence and that helpful privileged context induces entropy collapse and a systematic optimism bias. To address this, we propose a calibration-aware OPD framework, CaOPD, that estimates empirical confidence from model rollouts, replaces self-reported confidence with this student-grounded target, and distills the revised response through the same self-distillation pipeline. Experiments across various models and domains show that CaOPD achieves Pareto-optimal calibration while maintaining competitive capability, generalizing robustly under out-of-distribution and continual learning. Our findings highlight that capability distillation does not imply calibrated confidence, and that confidence should be treated as an essential objective in post-training. Code: https://github.com/SalesforceAIResearch/CaOPD