停止反复横跳：面向快速可逆扩散解码的上下文保全验证

摘要

平行扩散解码技术通过每步同时预测多个词元来加速扩散语言模型的推理过程，但过度追求并行度往往会损害输出质量。可撤销式解码通过重新校验早期词元来缓解这一问题，然而我们发现现有校验方案常引发"反复振荡"现象——即词元被重新掩码后又恢复原状。这种行为从两方面拖慢推理速度：对已校验位置的重新掩码会削弱并行草稿生成的语境条件，而反复的掩码循环会消耗修订预算却收效甚微。我们提出COVER（基于缓存覆盖的高效修订验证）算法，通过单次前向传播同时完成留一验证与稳定草稿生成。该技术利用KV缓存覆盖构建双重注意力视图：验证时掩码选定种子词元，同时将其缓存的关键值状态注入其他所有查询以保持上下文信息，并通过闭式对角校正防止种子位置的自泄露。COVER进一步采用兼顾不确定性、下游影响和缓存漂移的稳定性感知评分来优先选择种子，并动态调整每步验证的种子数量。在多组基准测试中，COVER显著减少了不必要的修订次数，在保持输出质量的同时实现了更快的解码速度。

English

Parallel diffusion decoding can accelerate diffusion language model inference by unmasking multiple tokens per step, but aggressive parallelism often harms quality. Revocable decoding mitigates this by rechecking earlier tokens, yet we observe that existing verification schemes frequently trigger flip-flop oscillations, where tokens are remasked and later restored unchanged. This behaviour slows inference in two ways: remasking verified positions weakens the conditioning context for parallel drafting, and repeated remask cycles consume the revision budget with little net progress. We propose COVER (Cache Override Verification for Efficient Revision), which performs leave-one-out verification and stable drafting within a single forward pass. COVER constructs two attention views via KV cache override: selected seeds are masked for verification, while their cached key value states are injected for all other queries to preserve contextual information, with a closed form diagonal correction preventing self leakage at the seed positions. COVER further prioritises seeds using a stability aware score that balances uncertainty, downstream influence, and cache drift, and it adapts the number of verified seeds per step. Across benchmarks, COVER markedly reduces unnecessary revisions and yields faster decoding while preserving output quality.

停止反复横跳：面向快速可逆扩散解码的上下文保全验证

Stop the Flip-Flop: Context-Preserving Verification for Fast Revocable Diffusion Decoding

摘要

Support