ChatPaper.aiChatPaper

您需要一个用于本地位置独立缓存的编码器

You Need an Encoder for Native Position-Independent Caching

February 2, 2026
作者: Shiju Zhao, Junhao Hu, Jiaqi Zheng, Guihai Chen
cs.AI

摘要

大型语言模型(LLM)的键值对(KV)缓存采用基于前缀的机制,导致其在处理任意顺序检索的上下文时效率低下。虽然已有研究提出位置无关缓存(PIC)技术以实现不受位置约束的KV重用,但现有方法往往导致模型精度显著下降,限制了实际应用。为解决该问题,我们通过为当前主流的纯解码器LLM重新引入编码器,并对其进行显式训练以支持PIC,提出了原生PIC方案。我们进一步开发了COMB——一个与现有推理框架无缝集成的PIC感知缓存系统。实验结果表明,COMB在保持相当精度的前提下,将首令牌生成时间(TTFT)缩短51-94%,吞吐量提升3倍。此外,通过DeepSeek-V2-Lite-Chat模型的质量提升验证了COMB对其他类型纯解码器LLM的适用性。代码已开源:https://github.com/shijuzhao/Comb。
English
The Key-Value (KV) cache of Large Language Models (LLMs) is prefix-based, making it highly inefficient for processing contexts retrieved in arbitrary order. Position-Independent Caching (PIC) has been proposed to enable KV reuse without positional constraints; however, existing approaches often incur substantial accuracy degradation, limiting their practical adoption. To address this issue, we propose native PIC by reintroducing the encoder to prevalent decoder-only LLMs and explicitly training it to support PIC. We further develop COMB, a PIC-aware caching system that integrates seamlessly with existing inference frameworks. Experimental results show that COMB reduces Time-to-First-Token (TTFT) by 51-94% and increases throughput by 3times with comparable accuracy. Furthermore, the quality improvement when using DeepSeek-V2-Lite-Chat demonstrates the applicability of COMB to other types of decoder-only LLMs. Our code is available at https://github.com/shijuzhao/Comb.
PDF01February 5, 2026