最適化すべき指標：AUC駆動型学習によるロバストなニューラル検索

要旨

デュアルエンコーダリトリーバーは、関連する文書が与えられたクエリに対して非関連文書よりも高いスコアを獲得すべきという原則に依存している。しかしながら、コントラスティブロスの基盤となる主要なノイズコントラスティブ推定（NCE）目的関数は、スコア分離の質に根本的に無関係であり、AUCとも無関係な、緩和されたランキング代理を最適化する。このミスマッチは、検索拡張生成（RAG）のような下流タスクにおいて、較正の不十分さと最適でないパフォーマンスを引き起こす。この根本的な制限に対処するため、我々はMWロスを導入する。これは、ROC曲線下面積（AUC）と数学的に等価なマン・ホイットニーのU統計量を最大化する新しい訓練目的関数である。MWロスは、スコア差に対する二値クロスエントロピーを最小化することで、各正例-負例ペアが正しくランク付けされるよう促す。我々は、MWロスが直接AoCを上界し、最適化を検索目標により良く整合させる理論的保証を提供する。さらに、リトリーバーの較正とランキング品質を評価するための自然な閾値なし診断として、ROC曲線とAUCを推奨する。実験的に、MWロスで訓練されたリトリーバーは、AUCおよび標準的な検索メトリクスにおいて、コントラスティブロスの対照群を一貫して上回る。我々の実験は、MWロスがコントラスティブロスに比べて経験的に優れた代替手段であり、RAGのような高リスクアプリケーションにおいて、より良く較正され、識別力のあるリトリーバーを提供することを示している。

English

Dual-encoder retrievers depend on the principle that relevant documents should score higher than irrelevant ones for a given query. Yet the dominant Noise Contrastive Estimation (NCE) objective, which underpins Contrastive Loss, optimizes a softened ranking surrogate that we rigorously prove is fundamentally oblivious to score separation quality and unrelated to AUC. This mismatch leads to poor calibration and suboptimal performance in downstream tasks like retrieval-augmented generation (RAG). To address this fundamental limitation, we introduce the MW loss, a new training objective that maximizes the Mann-Whitney U statistic, which is mathematically equivalent to the Area under the ROC Curve (AUC). MW loss encourages each positive-negative pair to be correctly ranked by minimizing binary cross entropy over score differences. We provide theoretical guarantees that MW loss directly upper-bounds the AoC, better aligning optimization with retrieval goals. We further promote ROC curves and AUC as natural threshold free diagnostics for evaluating retriever calibration and ranking quality. Empirically, retrievers trained with MW loss consistently outperform contrastive counterparts in AUC and standard retrieval metrics. Our experiments show that MW loss is an empirically superior alternative to Contrastive Loss, yielding better-calibrated and more discriminative retrievers for high-stakes applications like RAG.

最適化すべき指標：AUC駆動型学習によるロバストなニューラル検索

Optimizing What Matters: AUC-Driven Learning for Robust Neural Retrieval

要旨

Support