频谱投影评分：在检索增强生成中使检索摘要与读者模型对齐

摘要

大型语言模型（LLMs）通过检索增强生成（RAG）技术，在遵循检索器-阅读器范式的基础上，展现了更优的生成性能，该技术通过外部检索的知识补充模型输入。然而，先前的研究往往整体评估RAG，将检索器与阅读器联合考量，这使得难以单独衡量检索的真实贡献，尤其是考虑到作为阅读器的LLMs对提示的敏感性。我们引入了频谱投影评分（SPS），这是一种轻量级、无需监督的度量方法，它通过比较由检索摘要生成的标记所形成的区域与阅读器中子空间的主方向，来评估检索摘要与其隐藏表示之间的语义对齐程度，并以此衡量相关性。基于SPS，我们提出了xCompress，一个推理时控制器框架，它能动态采样、排序并压缩检索摘要候选。在五个问答基准测试及四种开源LLMs上的广泛实验表明，SPS不仅提升了一系列任务的性能，还为检索与生成之间的互动提供了原则性的视角。

English

Large Language Models (LLMs) have shown improved generation performance through retrieval-augmented generation (RAG) following the retriever-reader paradigm, which supplements model inputs with externally retrieved knowledge. However, prior work often evaluates RAG holistically, assessing the retriever and reader jointly, making it difficult to isolate the true contribution of retrieval, particularly given the prompt sensitivity of LLMs used as readers. We introduce Spectrum Projection Score (SPS), a lightweight, supervision-free metric that allows the reader to gauge the semantic alignment of a retrieved summary with its hidden representation by comparing the area formed by generated tokens from the summary, and the principal directions of subspace in the reader and to measure the relevance. Building on SPS we present xCompress, an inference time controller framework that dynamically samples, ranks, and compresses retrieval summary candidates. Extensive experiments on five QA benchmarks with four open source LLMs show that SPS not only enhances performance across a range of tasks but also provides a principled perspective on the interaction between retrieval and generation.

频谱投影评分：在检索增强生成中使检索摘要与读者模型对齐

Spectrum Projection Score: Aligning Retrieved Summaries with Reader Models in Retrieval-Augmented Generation

摘要

Support