ChatPaper.aiChatPaper

停止摇摆:面向快速可逆扩散解码的上下文保全验证

Stop the Flip-Flop: Context-Preserving Verification for Fast Revocable Diffusion Decoding

February 5, 2026
作者: Yanzheng Xiang, Lan Wei, Yizhen Yao, Qinglin Zhu, Hanqi Yan, Chen Jin, Philip Alexander Teare, Dandan Zhang, Lin Gui, Amrutha Saseendran, Yulan He
cs.AI

摘要

并行扩散解码通过单步解掩多个标记来加速扩散语言模型推理,但过度追求并行度常会损害生成质量。可撤销解码通过重新校验前期标记来缓解此问题,但我们发现现有校验方案常引发翻转振荡现象——标记被重新掩码后又在后续步骤恢复原状。这种行为从两方面拖慢推理:重新掩码已校验位置会削弱并行草稿生成的上下文条件,而反复的掩码循环消耗了大量修订预算却收效甚微。我们提出COVER(基于缓存覆盖的高效修订验证),通过单次前向传播同时完成留一验证与稳定草稿生成。COVER通过覆盖键值缓存构建双重视角:验证时掩码选定种子标记,同时将其缓存键值状态注入其他所有查询以保留上下文信息,并采用闭式对角校正防止种子位置的自泄露。该方法进一步通过平衡不确定性、下游影响及缓存漂移的稳定性感知评分来优先选择种子,并动态调整每步验证的种子数量。在多项基准测试中,COVER显著减少了不必要的修订,在保持输出质量的同时实现了更快的解码速度。
English
Parallel diffusion decoding can accelerate diffusion language model inference by unmasking multiple tokens per step, but aggressive parallelism often harms quality. Revocable decoding mitigates this by rechecking earlier tokens, yet we observe that existing verification schemes frequently trigger flip-flop oscillations, where tokens are remasked and later restored unchanged. This behaviour slows inference in two ways: remasking verified positions weakens the conditioning context for parallel drafting, and repeated remask cycles consume the revision budget with little net progress. We propose COVER (Cache Override Verification for Efficient Revision), which performs leave-one-out verification and stable drafting within a single forward pass. COVER constructs two attention views via KV cache override: selected seeds are masked for verification, while their cached key value states are injected for all other queries to preserve contextual information, with a closed form diagonal correction preventing self leakage at the seed positions. COVER further prioritises seeds using a stability aware score that balances uncertainty, downstream influence, and cache drift, and it adapts the number of verified seeds per step. Across benchmarks, COVER markedly reduces unnecessary revisions and yields faster decoding while preserving output quality.
PDF31February 12, 2026