フリップフロップを止める：高速リボカブル拡散デコードのための文脈保存検証

要旨

並列拡散復号は、ステップごとに複数のトークンをアンマスクすることで拡散言語モデルの推論を高速化できるが、過度な並列化は品質を損なうことが多い。取消可能復号は以前のトークンを再チェックすることでこれを緩和するが、既存の検証方式ではフリップフロップ振動が頻発することが観察される。これは、トークンが再マスクされた後、変更されずに復元される現象である。この動作は二つの方法で推論を遅延させる：検証済み位置の再マスクは並列起草のための条件付けコンテキストを弱体化させ、繰り返される再マスクサイクルは修正予算を浪費する。我々はCOVER（効率的修正のためのキャッシュ上書き検証）を提案する。これは単一のフォワードパス内でleave-one-out検証と安定した起草を実行する。COVERはKVキャッシュ上書きにより二つのアテンションビューを構築する：選択されたシードは検証のためにマスクされ、そのキャッシュされたキー・バリュー状態は他の全てのクエリに注入されて文脈情報を保持し、閉形式の対角補正によりシード位置での自己漏洩を防止する。COVERはさらに、不確実性、下流への影響、キャッシュドリフトをバランスさせる安定性認識スコアを用いてシードを優先し、ステップごとの検証シード数を適応させる。ベンチマークを通じて、COVERは不必要な修正を著しく削減し、出力品質を維持しつつより高速な復号を実現する。

English

Parallel diffusion decoding can accelerate diffusion language model inference by unmasking multiple tokens per step, but aggressive parallelism often harms quality. Revocable decoding mitigates this by rechecking earlier tokens, yet we observe that existing verification schemes frequently trigger flip-flop oscillations, where tokens are remasked and later restored unchanged. This behaviour slows inference in two ways: remasking verified positions weakens the conditioning context for parallel drafting, and repeated remask cycles consume the revision budget with little net progress. We propose COVER (Cache Override Verification for Efficient Revision), which performs leave-one-out verification and stable drafting within a single forward pass. COVER constructs two attention views via KV cache override: selected seeds are masked for verification, while their cached key value states are injected for all other queries to preserve contextual information, with a closed form diagonal correction preventing self leakage at the seed positions. COVER further prioritises seeds using a stability aware score that balances uncertainty, downstream influence, and cache drift, and it adapts the number of verified seeds per step. Across benchmarks, COVER markedly reduces unnecessary revisions and yields faster decoding while preserving output quality.

フリップフロップを止める：高速リボカブル拡散デコードのための文脈保存検証

Stop the Flip-Flop: Context-Preserving Verification for Fast Revocable Diffusion Decoding

要旨

Support