超越硬负样本：分数分布在稠密检索知识蒸馏中的重要性

摘要

通过知识蒸馏（KD）从交叉编码器教师模型中迁移知识，已成为训练检索模型的标准范式。尽管现有研究主要聚焦于挖掘困难负样本来提升判别能力，但训练数据的系统性构成及其产生的教师评分分布却较少受到关注。本研究指出，仅关注困难负样本会阻碍学生模型学习教师模型的完整偏好结构，可能影响泛化能力。为有效模拟教师评分分布，我们提出一种分层采样策略，实现对评分全域的均匀覆盖。在领域内和跨领域基准测试上的实验表明，该策略通过保留教师评分的方差与熵，可作为稳健的基线方法，在多类场景下显著优于Top-K采样和随机采样。这些发现表明，蒸馏的核心在于保留教师模型所感知的相对评分多样性。

English

Transferring knowledge from a cross-encoder teacher via Knowledge Distillation (KD) has become a standard paradigm for training retrieval models. While existing studies have largely focused on mining hard negatives to improve discrimination, the systematic composition of training data and the resulting teacher score distribution have received relatively less attention. In this work, we highlight that focusing solely on hard negatives prevents the student from learning the comprehensive preference structure of the teacher, potentially hampering generalization. To effectively emulate the teacher score distribution, we propose a Stratified Sampling strategy that uniformly covers the entire score spectrum. Experiments on in-domain and out-of-domain benchmarks confirm that Stratified Sampling, which preserves the variance and entropy of teacher scores, serves as a robust baseline, significantly outperforming top-K and random sampling in diverse settings. These findings suggest that the essence of distillation lies in preserving the diverse range of relative scores perceived by the teacher.

超越硬负样本：分数分布在稠密检索知识蒸馏中的重要性

Beyond Hard Negatives: The Importance of Score Distribution in Knowledge Distillation for Dense Retrieval

摘要

Support