主動學習者作為高效的PRP重排序器

摘要

成對排名提示（PRP）從大型語言模型（LLM）中引出成對偏好判斷，然後通常透過經典排序演算法將其匯總為排名。然而，判斷存在雜訊、對順序敏感且有時不具有遞移性，因此排序假設與情境不符。由於排序旨在恢復完整的排列，為了滿足呼叫預算而截斷排序無法產生可靠的前K個結果。因此，我們將PRP重新排序重新定義為從帶雜訊的成對比較中進行主動學習，並證明主動排序器是可即插即用的替代方案，能在呼叫受限的情況下提升每次呼叫的NDCG@10。我們的抗噪框架還引入了一個隨機方向預言機，每個成對比較僅需一次LLM呼叫。此方法將系統性的位置偏差轉換為零均值雜訊，能夠在不需雙向呼叫成本的情況下實現無偏的匯總排名。

English

Pairwise Ranking Prompting (PRP) elicits pairwise preference judgments from an LLM, which are then aggregated into a ranking, usually via classical sorting algorithms. However, judgments are noisy, order-sensitive, and sometimes intransitive, so sorting assumptions do not match the setting. Because sorting aims to recover a full permutation, truncating it to meet a call budget does not produce a dependable top-K. We thus reframe PRP reranking as active learning from noisy pairwise comparisons and show that active rankers are drop-in replacements that improve NDCG@10 per call in the call-constrained regime. Our noise-robust framework also introduces a randomized-direction oracle that uses a single LLM call per pair. This approach converts systematic position bias into zero-mean noise, enabling unbiased aggregate ranking without the cost of bidirectional calls.