マルチターン反射的マスキングによるマスク拡散モデルにおける推論の誘発

要旨

自己回帰（AR）モデルにおける推論は、しばしば連鎖思考推論やリフレクションによって実行されるが、以前の出力の改良は、局所的な編集のみが必要な場合でも、依然として完全な逐次生成に依存している。対照的に、マスク拡散モデル（MDM）におけるマスキング機構は、以前の出力に対する明示的な局所編集を自然にサポートし、以前の回答を破棄して最初から別の回答を生成することなく、選択的な改良を可能にする。この特性は、人間が反復的な局所改良によって誤りを修正する方法とより密接に一致するが、既存のMDMはマルチターンのマスキングとノイズ除去をサポートしていない。我々は、軽量なポストトレーニングを介してMDMにそのような内在的な推論能力を引き出すリフレクティブ・マスキング（RM）を提案する。RMは、MDMが進化するコンテキストに基づいて以前の出力を反復的に再訪・修正する、ネイティブなテスト時スケーリングを提供する。AR推論のように前回のターンからの洞察を活用するために、さらに履歴参照（History Reference）を導入する。これは、修正中の中間ノイズ除去状態を活用するパラメータフリーの機構である。我々のアプローチはアーキテクチャ変更を必要とせず、既存のMDMに容易に適用可能である。テキスト生成、数独、画像編集を含む多様なタスクとモダリティにおいて、リフレクティブ・マスキングは標準的なマスキングベースのベースラインを一貫して上回り、強い汎用性を示す。これにより、RMはMDMにおける推論のための基本的なプリミティブとして位置づけられる。

English

While reasoning on autoregressive (AR) models is often performed by chain-of-thought reasoning and reflection, their refinement of previous outputs still relies on fully sequential generation, even when only local edits are needed. In contrast, the masking mechanism in Mask Diffusion Models (MDMs) naturally supports explicit local edits on previous outputs, allowing selective refinement without discarding previous answers and generating another from scratch. While this property more closely aligns with how humans correct mistakes by iterative local refinement, existing MDMs do not support multi-turn masking and denoising. We propose Reflective Masking (RM), which elicits such an intrinsic reasoning capability in MDMs via lightweight post-training. RM provides a native test-time scaling, where an MDM iteratively revisits and revises its prior outputs based on evolving context. To exploit insights from previous turns like AR reasoning, we further introduce History Reference, a parameter-free mechanism that leverages intermediate denoising states during revision. Our approach requires no architectural changes and is easily applicable to existing MDMs. Across diverse tasks and modalities, including text generation, Sudoku, and image editing, Reflective Masking consistently outperforms standard masking-based baselines and demonstrates strong generality, positioning RM as a fundamental primitive for reasoning on MDMs.