DICE: 多項分布拡散とマスク生成モデルのための制御可能な編集を可能にする離散逆変換

要旨

離散拡散モデルは、画像生成やマスク言語モデリングなどのタスクで成功を収めてきましたが、制御されたコンテンツ編集においては制約があります。私たちは、離散拡散モデル（多項式拡散やマスク生成モデルを含む）に対する正確な逆操作を可能にする初めてのアプローチであるDICE（Discrete Inversion for Controllable Editing）を紹介します。逆拡散プロセス中にノイズシーケンスとマスキングパターンを記録することで、DICEは事前定義されたマスクや注意の操作を必要とせずに、離散データの正確な再構築と柔軟な編集を実現します。VQ-Diffusion、Paella、RoBERTaなどのモデルでDICEの効果を実証し、画像およびテキスト領域の両方で評価します。私たちの結果は、DICEが高いデータの忠実度を維持しながら編集能力を向上させ、離散空間における細かいコンテンツ操作の新たな機会を提供していることを示しています。プロジェクトのウェブページはこちらをご覧ください：https://hexiaoxiao-cs.github.io/DICE/.

English

Discrete diffusion models have achieved success in tasks like image generation and masked language modeling but face limitations in controlled content editing. We introduce DICE (Discrete Inversion for Controllable Editing), the first approach to enable precise inversion for discrete diffusion models, including multinomial diffusion and masked generative models. By recording noise sequences and masking patterns during the reverse diffusion process, DICE enables accurate reconstruction and flexible editing of discrete data without the need for predefined masks or attention manipulation. We demonstrate the effectiveness of DICE across both image and text domains, evaluating it on models such as VQ-Diffusion, Paella, and RoBERTa. Our results show that DICE preserves high data fidelity while enhancing editing capabilities, offering new opportunities for fine-grained content manipulation in discrete spaces. For project webpage, see https://hexiaoxiao-cs.github.io/DICE/.

DICE: 多項分布拡散とマスク生成モデルのための制御可能な編集を可能にする離散逆変換

DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models

要旨

Support