專注於查詢且具記憶感知的長上下文處理重排器

摘要

基於現有對大型語言模型中檢索頭的分析，我們提出了一種新型重排序框架，通過訓練模型利用特定注意力頭的分數來評估段落-查詢相關性。該方法提供了一種列表級解決方案，能在排序過程中充分利用候選短名單內的整體信息。同時，該框架自然生成連續相關性分數，無需依賴李克特量表標註即可在任意檢索數據集上進行訓練。我們的框架兼具輕量級與高效性，僅需小規模模型（如40億參數）即可實現強勁性能。大量實驗表明，該方法在多個領域（包括維基百科和長篇敘事數據集）上均超越現有最先進的點對點及列表級重排序模型，並在評估對話理解與記憶使用能力的LoCoMo基準測試中創下新紀錄。我們進一步驗證了框架的靈活擴展性：例如通過添加上下文信息增強候選段落可提升排序準確率，而採用中間層注意力頭進行訓練則能在保持性能的同時提升效率。

English

Built upon the existing analysis of retrieval heads in large language models, we propose an alternative reranking framework that trains models to estimate passage-query relevance using the attention scores of selected heads. This approach provides a listwise solution that leverages holistic information within the entire candidate shortlist during ranking. At the same time, it naturally produces continuous relevance scores, enabling training on arbitrary retrieval datasets without requiring Likert-scale supervision. Our framework is lightweight and effective, requiring only small-scale models (e.g., 4B parameters) to achieve strong performance. Extensive experiments demonstrate that our method outperforms existing state-of-the-art pointwise and listwise rerankers across multiple domains, including Wikipedia and long narrative datasets. It further establishes a new state-of-the-art on the LoCoMo benchmark that assesses the capabilities of dialogue understanding and memory usage. We further demonstrate that our framework supports flexible extensions. For example, augmenting candidate passages with contextual information further improves ranking accuracy, while training attention heads from middle layers enhances efficiency without sacrificing performance.

專注於查詢且具記憶感知的長上下文處理重排器

Query-focused and Memory-aware Reranker for Long Context Processing

摘要

Support