문서 재순위 지정을 위한 소형 언어 모델의 추론 과정 정제 및 개선

초록

추론 집약적인 문서 순위화를 위한 소형 언어 모델을 훈련시키는 새로운 접근 방식을 제시합니다. 이 방법은 지식 증류와 강화 학습 최적화를 결합합니다. 기존 방법들이 비용이 많이 드는 인간 주석이나 대형 블랙박스 언어 모델에 의존하는 반면, 우리의 방법론은 웹 데이터와 교사 LLM을 활용하여 관련성 설명과 함께 고품질의 훈련 예제를 자동으로 생성합니다. 문서 순위화를 강화 학습 문제로 설정하고 명시적 추론 능력을 장려함으로써, 우리는 BRIGHT 벤치마크에서 최첨단 성능을 달성하는 3B 파라미터의 컴팩트한 언어 모델을 훈련시켰습니다. 우리 모델은 리더보드에서 3위를 차지하면서도 다른 접근 방식보다 훨씬 적은 파라미터를 사용하며, 20배 이상 큰 모델들을 능가합니다. 광범위한 실험을 통해, 관련성 점수를 직접 예측하는 대신 추론 과정에서 설명을 생성하는 것이 더 작은 언어 모델로 더 효과적인 추론을 가능하게 한다는 것을 입증했습니다. 우리 방법의 자기 지도적 특성은 현대 정보 검색 시스템을 위한 확장 가능하고 해석 가능한 솔루션을 제공합니다.

English

We present a novel approach for training small language models for reasoning-intensive document ranking that combines knowledge distillation with reinforcement learning optimization. While existing methods often rely on expensive human annotations or large black-box language models, our methodology leverages web data and a teacher LLM to automatically generate high-quality training examples with relevance explanations. By framing document ranking as a reinforcement learning problem and incentivizing explicit reasoning capabilities, we train a compact 3B parameter language model that achieves state-of-the-art performance on the BRIGHT benchmark. Our model ranks third on the leaderboard while using substantially fewer parameters than other approaches, outperforming models that are over 20 times larger. Through extensive experiments, we demonstrate that generating explanations during inference, rather than directly predicting relevance scores, enables more effective reasoning with smaller language models. The self-supervised nature of our method offers a scalable and interpretable solution for modern information retrieval systems.

문서 재순위 지정을 위한 소형 언어 모델의 추론 과정 정제 및 개선

Distillation and Refinement of Reasoning in Small Language Models for Document Re-ranking

초록

Support