ChatPaper.aiChatPaper

检测检索增强生成中压缩令牌表示的溢出问题

Detecting Overflow in Compressed Token Representations for Retrieval-Augmented Generation

February 12, 2026
作者: Julia Belikova, Danila Rozhevskii, Dennis Svirin, Konstantin Polev, Alexander Panchenko
cs.AI

摘要

高效处理长上下文始终是当代大语言模型(LLMs)面临的关键挑战,尤其在资源受限环境中。软压缩架构通过用少量经过学习的压缩标记替代长标记序列,有望扩展有效上下文长度。然而,压缩的极限——以及压缩何时开始抹除任务相关内容——仍未得到充分探索。本文定义了标记溢出这一现象,即压缩表示不再包含足够信息以回答特定查询的状态,并提出一种表征与检测该现象的方法。在xRAG软压缩场景中,我们发现与查询无关的饱和统计量能可靠区分压缩与未压缩标记表示,为识别压缩标记提供了实用工具,但其溢出检测能力有限。基于查询和上下文xRAG表示的轻量级探测分类器在HotpotQA、SQuADv2和TriviaQA数据集上平均达到0.72 AUC-ROC的溢出检测效果,表明融入查询信息可提升检测性能。这些成果实现了从查询无关诊断到查询感知检测的进阶,为建立低成本的大语言模型前置门控机制以规避压缩引发的错误提供了可能。
English
Efficient long-context processing remains a crucial challenge for contemporary large language models (LLMs), especially in resource-constrained environments. Soft compression architectures promise to extend effective context length by replacing long token sequences with smaller sets of learned compressed tokens. Yet, the limits of compressibility -- and when compression begins to erase task-relevant content -- remain underexplored. In this paper, we define token overflow as a regime in which compressed representations no longer contain sufficient information to answer a given query, and propose a methodology to characterize and detect it. In the xRAG soft-compression setting, we find that query-agnostic saturation statistics reliably separate compressed from uncompressed token representations, providing a practical tool for identifying compressed tokens but showing limited overflow detection capability. Lightweight probing classifiers over both query and context xRAG representations detect overflow with 0.72 AUC-ROC on average on HotpotQA, SQuADv2, and TriviaQA datasets, demonstrating that incorporating query information improves detection performance. These results advance from query-independent diagnostics to query-aware detectors, enabling low-cost pre-LLM gating to mitigate compression-induced errors.
PDF11February 19, 2026