優化關鍵指標:基於AUC驅動的學習實現穩健神經檢索
Optimizing What Matters: AUC-Driven Learning for Robust Neural Retrieval
September 30, 2025
作者: Nima Sheikholeslami, Erfan Hosseini, Patrice Bechard, Srivatsava Daruru, Sai Rajeswar
cs.AI
摘要
雙編碼器檢索模型依賴於一個原則,即對於給定查詢,相關文檔的得分應高於不相關文檔。然而,主流的噪聲對比估計(NCE)目標,作為對比損失的基礎,優化的是一個軟化的排序替代指標,我們嚴格證明該指標從根本上忽略了得分分離質量,並與AUC無關。這種不匹配導致了在下游任務(如檢索增強生成,RAG)中的校準不佳和性能次優。為解決這一根本性限制,我們引入了MW損失,這是一種新的訓練目標,旨在最大化曼-惠特尼U統計量,該統計量在數學上等同於ROC曲線下面積(AUC)。MW損失通過最小化得分差異上的二元交叉熵,鼓勵每對正負樣本被正確排序。我們提供了理論保證,證明MW損失直接上界於AoC,從而更好地將優化與檢索目標對齊。我們進一步推廣ROC曲線和AUC作為評估檢索器校準和排序質量的自然無閾值診斷工具。實證表明,使用MW損失訓練的檢索器在AUC和標準檢索指標上始終優於對比損失的對應模型。我們的實驗顯示,MW損失是對比損失的實證上更優替代方案,為高風險應用(如RAG)提供了校準更好、區分能力更強的檢索器。
English
Dual-encoder retrievers depend on the principle that relevant documents
should score higher than irrelevant ones for a given query. Yet the dominant
Noise Contrastive Estimation (NCE) objective, which underpins Contrastive Loss,
optimizes a softened ranking surrogate that we rigorously prove is
fundamentally oblivious to score separation quality and unrelated to AUC. This
mismatch leads to poor calibration and suboptimal performance in downstream
tasks like retrieval-augmented generation (RAG). To address this fundamental
limitation, we introduce the MW loss, a new training objective that maximizes
the Mann-Whitney U statistic, which is mathematically equivalent to the Area
under the ROC Curve (AUC). MW loss encourages each positive-negative pair to be
correctly ranked by minimizing binary cross entropy over score differences. We
provide theoretical guarantees that MW loss directly upper-bounds the AoC,
better aligning optimization with retrieval goals. We further promote ROC
curves and AUC as natural threshold free diagnostics for evaluating retriever
calibration and ranking quality. Empirically, retrievers trained with MW loss
consistently outperform contrastive counterparts in AUC and standard retrieval
metrics. Our experiments show that MW loss is an empirically superior
alternative to Contrastive Loss, yielding better-calibrated and more
discriminative retrievers for high-stakes applications like RAG.