랭크1: 정보 검색에서 재랭킹을 위한 테스트 시간 계산

초록

우리는 테스트 시간 계산을 활용하도록 훈련된 첫 번째 리랭킹 모델인 Rank1을 소개합니다. Rank1은 검색 분야에서 추론 언어 모델(예: OpenAI의 o1, Deepseek의 R1 등)을 활용하여 더 작은 모델의 성능을 빠르게 향상시키는 방법의 적용 가능성을 입증합니다. 우리는 MS MARCO의 쿼리와 문서에서 수집한 60만 개 이상의 R1 추론 트레이스 데이터셋을 공개했습니다. 이 데이터셋으로 훈련된 모델은 다음과 같은 특징을 보입니다: (1) 고급 추론 및 지시 따르기 데이터셋에서 최첨단 성능을 달성; (2) 사용자 입력 프롬프트에 응답할 수 있는 능력 덕분에 분포 외 데이터에서도 뛰어난 성능 발휘; (3) 사용자나 RAG 기반 시스템에 제공할 수 있는 설명 가능한 추론 체인 보유. 또한, 이러한 모델의 양자화 버전이 계산/메모리 사용량을 줄이면서도 강력한 성능을 유지함을 보여줍니다. 전반적으로, Rank1은 테스트 시간 계산이 검색을 위한 설명 가능하고 성능이 뛰어난 새로운 유형의 리랭커 모델을 가능하게 함을 입증합니다.

English

We introduce Rank1, the first reranking model trained to take advantage of test-time compute. Rank1 demonstrates the applicability within retrieval of using a reasoning language model (i.e. OpenAI's o1, Deepseek's R1, etc.) for distillation in order to rapidly improve the performance of a smaller model. We gather and open-source a dataset of more than 600,000 examples of R1 reasoning traces from queries and passages in MS MARCO. Models trained on this dataset show: (1) state-of-the-art performance on advanced reasoning and instruction following datasets; (2) work remarkably well out of distribution due to the ability to respond to user-input prompts; and (3) have explainable reasoning chains that can be given to users or RAG-based systems. Further, we demonstrate that quantized versions of these models retain strong performance while using less compute/memory. Overall, Rank1 shows that test-time compute allows for a fundamentally new type of explainable and performant reranker model for search.

랭크1: 정보 검색에서 재랭킹을 위한 테스트 시간 계산

Rank1: Test-Time Compute for Reranking in Information Retrieval

초록

Support