ChatPaper.aiChatPaper

主動學習者作為高效的PRP重排序器

Active Learners as Efficient PRP Rerankers

May 15, 2026
作者: Jeremías Figueiredo Paschmann, Juan Kaplan, Francisco Nattero, Santiago Barron, Juan Wisznia, Luciano del Corro
cs.AI

摘要

成對排名提示(PRP)從大型語言模型(LLM)中引出成對偏好判斷,然後通常透過經典排序演算法將其匯總為排名。然而,判斷存在雜訊、對順序敏感且有時不具有遞移性,因此排序假設與情境不符。由於排序旨在恢復完整的排列,為了滿足呼叫預算而截斷排序無法產生可靠的前K個結果。因此,我們將PRP重新排序重新定義為從帶雜訊的成對比較中進行主動學習,並證明主動排序器是可即插即用的替代方案,能在呼叫受限的情況下提升每次呼叫的NDCG@10。我們的抗噪框架還引入了一個隨機方向預言機,每個成對比較僅需一次LLM呼叫。此方法將系統性的位置偏差轉換為零均值雜訊,能夠在不需雙向呼叫成本的情況下實現無偏的匯總排名。
English
Pairwise Ranking Prompting (PRP) elicits pairwise preference judgments from an LLM, which are then aggregated into a ranking, usually via classical sorting algorithms. However, judgments are noisy, order-sensitive, and sometimes intransitive, so sorting assumptions do not match the setting. Because sorting aims to recover a full permutation, truncating it to meet a call budget does not produce a dependable top-K. We thus reframe PRP reranking as active learning from noisy pairwise comparisons and show that active rankers are drop-in replacements that improve NDCG@10 per call in the call-constrained regime. Our noise-robust framework also introduces a randomized-direction oracle that uses a single LLM call per pair. This approach converts systematic position bias into zero-mean noise, enabling unbiased aggregate ranking without the cost of bidirectional calls.