频谱投影评分:在检索增强生成中使检索摘要与读者模型对齐
Spectrum Projection Score: Aligning Retrieved Summaries with Reader Models in Retrieval-Augmented Generation
August 8, 2025
作者: Zhanghao Hu, Qinglin Zhu, Siya Qi, Yulan He, Hanqi Yan, Lin Gui
cs.AI
摘要
大型语言模型(LLMs)通过检索增强生成(RAG)技术,在遵循检索器-阅读器范式的基础上,展现了更优的生成性能,该技术通过外部检索的知识补充模型输入。然而,先前的研究往往整体评估RAG,将检索器与阅读器联合考量,这使得难以单独衡量检索的真实贡献,尤其是考虑到作为阅读器的LLMs对提示的敏感性。我们引入了频谱投影评分(SPS),这是一种轻量级、无需监督的度量方法,它通过比较由检索摘要生成的标记所形成的区域与阅读器中子空间的主方向,来评估检索摘要与其隐藏表示之间的语义对齐程度,并以此衡量相关性。基于SPS,我们提出了xCompress,一个推理时控制器框架,它能动态采样、排序并压缩检索摘要候选。在五个问答基准测试及四种开源LLMs上的广泛实验表明,SPS不仅提升了一系列任务的性能,还为检索与生成之间的互动提供了原则性的视角。
English
Large Language Models (LLMs) have shown improved generation performance
through retrieval-augmented generation (RAG) following the retriever-reader
paradigm, which supplements model inputs with externally retrieved knowledge.
However, prior work often evaluates RAG holistically, assessing the retriever
and reader jointly, making it difficult to isolate the true contribution of
retrieval, particularly given the prompt sensitivity of LLMs used as readers.
We introduce Spectrum Projection Score (SPS), a lightweight, supervision-free
metric that allows the reader to gauge the semantic alignment of a retrieved
summary with its hidden representation by comparing the area formed by
generated tokens from the summary, and the principal directions of subspace in
the reader and to measure the relevance. Building on SPS we present xCompress,
an inference time controller framework that dynamically samples, ranks, and
compresses retrieval summary candidates. Extensive experiments on five QA
benchmarks with four open source LLMs show that SPS not only enhances
performance across a range of tasks but also provides a principled perspective
on the interaction between retrieval and generation.