ドキュメント再ランキングのための小型言語モデルにおける推論の蒸留と洗練

要旨

我々は、知識蒸留と強化学習最適化を組み合わせた、推論集約型ドキュメントランキングのための小型言語モデルを訓練する新規アプローチを提案する。既存手法が高コストな人間のアノテーションや大規模なブラックボックス言語モデルに依存するのに対し、本手法ではウェブデータと教師LLMを活用して、関連性の説明付きの高品質な訓練例を自動生成する。ドキュメントランキングを強化学習問題として定式化し、明示的な推論能力を促進することで、3Bパラメータのコンパクトな言語モデルを訓練し、BRIGHTベンチマークにおいて最先端の性能を達成した。本モデルはリーダーボードで3位にランクインしつつ、他のアプローチよりも大幅に少ないパラメータ数で、20倍以上大きなモデルを上回る性能を示した。広範な実験を通じて、関連性スコアを直接予測するのではなく、推論中に説明を生成することが、小型言語モデルによるより効果的な推論を可能にすることを実証した。本手法の自己教師あり特性は、現代の情報検索システムに対するスケーラブルで解釈可能なソリューションを提供する。

English

We present a novel approach for training small language models for reasoning-intensive document ranking that combines knowledge distillation with reinforcement learning optimization. While existing methods often rely on expensive human annotations or large black-box language models, our methodology leverages web data and a teacher LLM to automatically generate high-quality training examples with relevance explanations. By framing document ranking as a reinforcement learning problem and incentivizing explicit reasoning capabilities, we train a compact 3B parameter language model that achieves state-of-the-art performance on the BRIGHT benchmark. Our model ranks third on the leaderboard while using substantially fewer parameters than other approaches, outperforming models that are over 20 times larger. Through extensive experiments, we demonstrate that generating explanations during inference, rather than directly predicting relevance scores, enables more effective reasoning with smaller language models. The self-supervised nature of our method offers a scalable and interpretable solution for modern information retrieval systems.

ドキュメント再ランキングのための小型言語モデルにおける推論の蒸留と洗練

Distillation and Refinement of Reasoning in Small Language Models for Document Re-ranking

要旨

Support