优化关键指标：基于AUC驱动的鲁棒神经检索学习

摘要

双编码器检索模型依赖于一个基本原则：对于给定查询，相关文档的得分应高于不相关文档。然而，主导性的噪声对比估计（NCE）目标函数，作为对比损失的基础，优化的是一个软化的排序替代指标。我们严格证明，该指标从根本上忽视了得分分离质量，且与AUC（曲线下面积）无关。这种不匹配导致了在下游任务（如检索增强生成，RAG）中的校准不佳和性能次优。为应对这一根本性局限，我们引入了MW损失，一种新的训练目标，旨在最大化曼-惠特尼U统计量，该统计量在数学上等同于ROC曲线下的面积（AUC）。MW损失通过最小化得分差异上的二元交叉熵，激励每一对正负样本正确排序。我们提供了理论保证，证明MW损失直接上界于AoC，从而更好地将优化目标与检索任务对齐。我们进一步提倡将ROC曲线和AUC作为评估检索器校准和排序质量的自然无阈值诊断工具。实证表明，采用MW损失训练的检索模型在AUC和标准检索指标上持续超越对比损失模型。我们的实验证实，MW损失是对比损失的一个实证上更优的替代方案，为诸如RAG等高风险应用提供了校准更佳、区分能力更强的检索模型。

English

Dual-encoder retrievers depend on the principle that relevant documents should score higher than irrelevant ones for a given query. Yet the dominant Noise Contrastive Estimation (NCE) objective, which underpins Contrastive Loss, optimizes a softened ranking surrogate that we rigorously prove is fundamentally oblivious to score separation quality and unrelated to AUC. This mismatch leads to poor calibration and suboptimal performance in downstream tasks like retrieval-augmented generation (RAG). To address this fundamental limitation, we introduce the MW loss, a new training objective that maximizes the Mann-Whitney U statistic, which is mathematically equivalent to the Area under the ROC Curve (AUC). MW loss encourages each positive-negative pair to be correctly ranked by minimizing binary cross entropy over score differences. We provide theoretical guarantees that MW loss directly upper-bounds the AoC, better aligning optimization with retrieval goals. We further promote ROC curves and AUC as natural threshold free diagnostics for evaluating retriever calibration and ranking quality. Empirically, retrievers trained with MW loss consistently outperform contrastive counterparts in AUC and standard retrieval metrics. Our experiments show that MW loss is an empirically superior alternative to Contrastive Loss, yielding better-calibrated and more discriminative retrievers for high-stakes applications like RAG.

优化关键指标：基于AUC驱动的鲁棒神经检索学习

Optimizing What Matters: AUC-Driven Learning for Robust Neural Retrieval

摘要

Support