面向查询与记忆感知的长文本重排序器

摘要

基于现有对大型语言模型中检索头部分析的研究，我们提出了一种新型重排序框架，通过训练模型利用选定注意力头的评分来评估段落-查询相关性。该方法提供了列表级解决方案，在排序过程中充分利用候选短名单内的整体信息。同时，该框架能自然生成连续的相关性评分，使得模型无需依赖李克特量表标注即可在任意检索数据集上进行训练。我们的框架兼具轻量化与高效性，仅需小规模模型（如40亿参数）即可实现强劲性能。大量实验表明，本方法在维基百科和长篇叙事数据集等多个领域均优于现有的点式和列表式先进重排序器，并在评估对话理解与记忆能力的LoCoMo基准测试中创造了最新纪录。我们进一步证明该框架支持灵活扩展：例如通过增强候选段落的上下文信息可提升排序准确率，而采用中间层注意力头进行训练则能在保持性能的同时提升效率。

English

Built upon the existing analysis of retrieval heads in large language models, we propose an alternative reranking framework that trains models to estimate passage-query relevance using the attention scores of selected heads. This approach provides a listwise solution that leverages holistic information within the entire candidate shortlist during ranking. At the same time, it naturally produces continuous relevance scores, enabling training on arbitrary retrieval datasets without requiring Likert-scale supervision. Our framework is lightweight and effective, requiring only small-scale models (e.g., 4B parameters) to achieve strong performance. Extensive experiments demonstrate that our method outperforms existing state-of-the-art pointwise and listwise rerankers across multiple domains, including Wikipedia and long narrative datasets. It further establishes a new state-of-the-art on the LoCoMo benchmark that assesses the capabilities of dialogue understanding and memory usage. We further demonstrate that our framework supports flexible extensions. For example, augmenting candidate passages with contextual information further improves ranking accuracy, while training attention heads from middle layers enhances efficiency without sacrificing performance.

面向查询与记忆感知的长文本重排序器

Query-focused and Memory-aware Reranker for Long Context Processing

摘要

Support