다중 회차 반영적 마스킹이 마스크 확산 모델에서 추론을 유발한다

초록

자기회귀(AR) 모델에서의 추론은 종종 사고 연쇄 추론과 반성을 통해 수행되지만, 이전 출력물의 개선은 여전히 완전한 순차적 생성에 의존하며, 이는 국소적 편집만 필요한 경우에도 마찬가지이다. 반면, 마스크 확산 모델(MDM)의 마스킹 메커니즘은 이전 출력물에 대한 명시적 국소 편집을 자연스럽게 지원하여, 이전 답변을 폐기하고 처음부터 다시 생성하는 대신 선택적 개선을 가능하게 한다. 이러한 특성은 인간이 반복적인 국소 개선을 통해 오류를 수정하는 방식과 더욱 밀접하게 일치하지만, 기존 MDM은 다중 턴 마스킹 및 잡음 제거를 지원하지 않는다. 본 연구에서는 경량 사후 훈련을 통해 MDM의 본질적 추론 능력을 이끌어내는 **반사적 마스킹(RM)**을 제안한다. RM은 네이티브 테스트 시간 스케일링을 제공하며, MDM이 진화하는 맥락에 기반하여 이전 출력물을 반복적으로 재검토하고 수정한다. AR 추론에서의 이전 턴으로부터의 통찰을 활용하기 위해, **히스토리 참조**라는 매개변수 없는 메커니즘을 추가로 도입하여 수정 과정에서 중간 잡음 제거 상태를 활용한다. 본 접근법은 아키텍처 변경이 필요 없으며 기존 MDM에 쉽게 적용 가능하다. 텍스트 생성, 스도쿠, 이미지 편집 등 다양한 작업과 모달리티에 걸쳐 반사적 마스킹은 표준 마스킹 기반 베이스라인을 일관되게 능가하며 강력한 일반성을 입증함으로써, RM을 MDM에서의 추론을 위한 기본 프리미티브로 자리매김한다.

English

While reasoning on autoregressive (AR) models is often performed by chain-of-thought reasoning and reflection, their refinement of previous outputs still relies on fully sequential generation, even when only local edits are needed. In contrast, the masking mechanism in Mask Diffusion Models (MDMs) naturally supports explicit local edits on previous outputs, allowing selective refinement without discarding previous answers and generating another from scratch. While this property more closely aligns with how humans correct mistakes by iterative local refinement, existing MDMs do not support multi-turn masking and denoising. We propose Reflective Masking (RM), which elicits such an intrinsic reasoning capability in MDMs via lightweight post-training. RM provides a native test-time scaling, where an MDM iteratively revisits and revises its prior outputs based on evolving context. To exploit insights from previous turns like AR reasoning, we further introduce History Reference, a parameter-free mechanism that leverages intermediate denoising states during revision. Our approach requires no architectural changes and is easily applicable to existing MDMs. Across diverse tasks and modalities, including text generation, Sudoku, and image editing, Reflective Masking consistently outperforms standard masking-based baselines and demonstrates strong generality, positioning RM as a fundamental primitive for reasoning on MDMs.