专家并行上下文解码在检索增强生成中的应用
Parallel Context-of-Experts Decoding for Retrieval Augmented Generation
January 13, 2026
作者: Giulio Corallo, Paolo Papotti
cs.AI
摘要
检索增强生成技术面临一个权衡难题:将多篇文档拼接为长提示虽能实现跨文档推理,却会引发预填充瓶颈;而单独编码文档键值缓存虽可提升速度,但会割裂文档间交互。我们提出并行专家上下文解码(Pced),这一无需训练的新框架将证据聚合机制从注意力层转移至解码层。Pced将检索到的文档视为独立"专家",通过创新的检索感知对比解码规则同步各专家预测,该规则以模型先验为基准加权专家逻辑值。该方法无需构建跨文档共享注意力,即可恢复跨文档推理能力。
English
Retrieval Augmented Generation faces a trade-off: concatenating documents in a long prompt enables multi-document reasoning but creates prefill bottlenecks, while encoding document KV caches separately offers speed but breaks cross-document interaction. We propose Parallel Context-of-Experts Decoding (Pced), a training-free framework that shifts evidence aggregation from the attention mechanism to the decoding. Pced treats retrieved documents as isolated "experts", synchronizing their predictions via a novel retrieval-aware contrastive decoding rule that weighs expert logits against the model prior. This approach recovers cross-document reasoning capabilities without constructing a shared attention across documents.