효율적인 PRP 재순위화기로서의 능동 학습자

초록

쌍대 비교 랭킹 프롬프팅(PRP)은 LLM으로부터 쌍대 선호 판단을 이끌어내며, 이를 일반적으로 고전적인 정렬 알고리즘을 통해 순위로 집계한다. 그러나 판단에는 잡음이 있고 순서에 민감하며 때로는 비이행적이므로, 정렬 가정은 이러한 설정에 부합하지 않는다. 정렬은 전체 순열을 복원하는 것을 목표로 하기 때문에, 호출 예산을 맞추기 위해 이를 잘라내면 신뢰할 수 있는 상위-K를 생성하지 못한다. 따라서 우리는 PRP 재순위화를 잡음이 있는 쌍대 비교로부터의 능동 학습으로 재정의하고, 능동 순위화 도구가 호출 제한 환경에서 호출당 NDCG@10을 개선하는 대체 가능한 방법임을 보여준다. 우리의 잡음 강건 프레임워크는 또한 쌍당 하나의 LLM 호출을 사용하는 무작위 방향 오라클을 도입한다. 이 접근 방식은 체계적인 위치 편향을 평균 0 잡음으로 변환하여, 양방향 호출 비용 없이 편향되지 않은 집계 순위를 가능하게 한다.

English

Pairwise Ranking Prompting (PRP) elicits pairwise preference judgments from an LLM, which are then aggregated into a ranking, usually via classical sorting algorithms. However, judgments are noisy, order-sensitive, and sometimes intransitive, so sorting assumptions do not match the setting. Because sorting aims to recover a full permutation, truncating it to meet a call budget does not produce a dependable top-K. We thus reframe PRP reranking as active learning from noisy pairwise comparisons and show that active rankers are drop-in replacements that improve NDCG@10 per call in the call-constrained regime. Our noise-robust framework also introduces a randomized-direction oracle that uses a single LLM call per pair. This approach converts systematic position bias into zero-mean noise, enabling unbiased aggregate ranking without the cost of bidirectional calls.