超越硬负例：分数分布在稠密检索知识蒸馏中的重要性

摘要

通过知识蒸馏(KD)从交叉编码器教师模型迁移知识已成为训练检索模型的标准范式。现有研究主要集中于挖掘困难负样本来提升判别力，而对训练数据的系统构建及由此产生的教师评分分布关注相对不足。本研究指出，仅关注困难负样本会阻碍学生模型学习教师完整的偏好结构，可能影响泛化能力。为有效模拟教师评分分布，我们提出一种分层采样策略，实现对评分全谱系的均匀覆盖。在领域内和跨领域基准测试上的实验表明，保持教师评分方差与熵的分层采样可作为稳健基线，在多种场景下显著优于Top-K采样和随机采样。这些发现表明，蒸馏的本质在于保留教师所感知的相对评分多样性。

English

Transferring knowledge from a cross-encoder teacher via Knowledge Distillation (KD) has become a standard paradigm for training retrieval models. While existing studies have largely focused on mining hard negatives to improve discrimination, the systematic composition of training data and the resulting teacher score distribution have received relatively less attention. In this work, we highlight that focusing solely on hard negatives prevents the student from learning the comprehensive preference structure of the teacher, potentially hampering generalization. To effectively emulate the teacher score distribution, we propose a Stratified Sampling strategy that uniformly covers the entire score spectrum. Experiments on in-domain and out-of-domain benchmarks confirm that Stratified Sampling, which preserves the variance and entropy of teacher scores, serves as a robust baseline, significantly outperforming top-K and random sampling in diverse settings. These findings suggest that the essence of distillation lies in preserving the diverse range of relative scores perceived by the teacher.