ChatPaper.aiChatPaper

专家上下文并行解码在检索增强生成中的应用

Parallel Context-of-Experts Decoding for Retrieval Augmented Generation

January 13, 2026
作者: Giulio Corallo, Paolo Papotti
cs.AI

摘要

檢索增強生成技術正面臨著兩難抉擇:將多份文檔拼接為長提示詞雖能實現多文檔推理,卻會造成預填充瓶頸;而對文檔鍵值緩存進行分離編碼雖能提升速度,但會破壞跨文檔交互。我們提出並行專家上下文解碼(Pced),這一無需訓練的框架將證據聚合機制從注意力層轉移至解碼層。Pced將檢索文檔視作獨立「專家」,通過創新型的檢索感知對比解碼規則同步各專家預測,該規則以模型先驗為基準對專家邏輯值進行加權。此方法無需構建跨文檔共享注意力機制,即可恢復跨文檔推理能力。
English
Retrieval Augmented Generation faces a trade-off: concatenating documents in a long prompt enables multi-document reasoning but creates prefill bottlenecks, while encoding document KV caches separately offers speed but breaks cross-document interaction. We propose Parallel Context-of-Experts Decoding (Pced), a training-free framework that shifts evidence aggregation from the attention mechanism to the decoding. Pced treats retrieved documents as isolated "experts", synchronizing their predictions via a novel retrieval-aware contrastive decoding rule that weighs expert logits against the model prior. This approach recovers cross-document reasoning capabilities without constructing a shared attention across documents.
PDF171January 15, 2026