クエリ指向メモリ対応リランカーによる長文脈処理

要旨

大規模言語モデルにおける検索ヘッドの既存分析を基盤として、本論文は選択されたヘッドのアテンションスコアを用いて文書-クエリ関連性を推定するようにモデルを訓練する新たなリランキングフレームワークを提案する。このアプローチは、ランキング過程において候補短文リスト全体の包括的情報を活用するリストワイズ解法を提供する。同時に、連続的な関連性スコアを自然に生成するため、リッカート尺度の教師信号を必要とせず任意の検索データセットでの学習が可能である。当フレームワークは軽量かつ効率的であり、小規模モデル（例：40億パラメータ）のみで強力な性能を達成する。大規模な実験により、本手法がWikipediaや長編ナラティブデータセットを含む複数領域において、既存の最先端ポイントワイズ・リストワイズリランカーを凌駕することを実証する。さらに、対話理解とメモリ使用能力を評価するLoCoMoベンチマークにおいて新たな最高性能を確立する。本フレームワークが柔軟な拡張性をサポートすることも実証する。例えば、候補文書に文脈情報を付加することでランキング精度がさらに向上し、中間層のアテンションヘッドを訓練することで性能を犠牲にせず効率性が向上する。

English

Built upon the existing analysis of retrieval heads in large language models, we propose an alternative reranking framework that trains models to estimate passage-query relevance using the attention scores of selected heads. This approach provides a listwise solution that leverages holistic information within the entire candidate shortlist during ranking. At the same time, it naturally produces continuous relevance scores, enabling training on arbitrary retrieval datasets without requiring Likert-scale supervision. Our framework is lightweight and effective, requiring only small-scale models (e.g., 4B parameters) to achieve strong performance. Extensive experiments demonstrate that our method outperforms existing state-of-the-art pointwise and listwise rerankers across multiple domains, including Wikipedia and long narrative datasets. It further establishes a new state-of-the-art on the LoCoMo benchmark that assesses the capabilities of dialogue understanding and memory usage. We further demonstrate that our framework supports flexible extensions. For example, augmenting candidate passages with contextual information further improves ranking accuracy, while training attention heads from middle layers enhances efficiency without sacrificing performance.

クエリ指向メモリ対応リランカーによる長文脈処理

Query-focused and Memory-aware Reranker for Long Context Processing

要旨

Support