DICE:離散反轉,為多項式擴散和遮罩生成模型提供可控編輯功能
DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models
October 10, 2024
作者: Xiaoxiao He, Ligong Han, Quan Dao, Song Wen, Minhao Bai, Di Liu, Han Zhang, Martin Renqiang Min, Felix Juefei-Xu, Chaowei Tan, Bo Liu, Kang Li, Hongdong Li, Junzhou Huang, Faez Ahmed, Akash Srivastava, Dimitris Metaxas
cs.AI
摘要
離散擴散模型在影像生成和遮罩語言建模等任務中取得成功,但在受控內容編輯方面存在限制。我們引入了DICE(Discrete Inversion for Controllable Editing),這是第一種能夠實現對離散擴散模型進行精確反演的方法,包括多項式擴散和遮罩生成模型。通過在反向擴散過程中記錄噪音序列和遮罩模式,DICE實現了對離散數據的準確重構和靈活編輯,無需預定義遮罩或注意力操作。我們展示了DICE在影像和文本領域的有效性,對VQ-Diffusion、Paella和RoBERTa等模型進行了評估。我們的結果表明,DICE保留了高數據保真度,同時增強了編輯功能,為離散空間中的精細內容操作提供了新機會。有關項目網頁,請參見https://hexiaoxiao-cs.github.io/DICE/。
English
Discrete diffusion models have achieved success in tasks like image
generation and masked language modeling but face limitations in controlled
content editing. We introduce DICE (Discrete Inversion for Controllable
Editing), the first approach to enable precise inversion for discrete diffusion
models, including multinomial diffusion and masked generative models. By
recording noise sequences and masking patterns during the reverse diffusion
process, DICE enables accurate reconstruction and flexible editing of discrete
data without the need for predefined masks or attention manipulation. We
demonstrate the effectiveness of DICE across both image and text domains,
evaluating it on models such as VQ-Diffusion, Paella, and RoBERTa. Our results
show that DICE preserves high data fidelity while enhancing editing
capabilities, offering new opportunities for fine-grained content manipulation
in discrete spaces. For project webpage, see
https://hexiaoxiao-cs.github.io/DICE/.Summary
AI-Generated Summary