自定義編輯：使用自定擴散模型進行文本引導的圖像編輯

摘要

文字到圖像擴散模型能根據使用者提供的文字提示生成多樣且高保真度的圖像。最近的研究將這些模型擴展為支持文字引導的圖像編輯。儘管文字引導對用戶來說是直觀的編輯界面，但往往無法確保準確傳達用戶所表達的概念。為解決此問題，我們提出了Custom-Edit，其中我們（i）使用少量參考圖像自定義擴散模型，然後（ii）進行文字引導編輯。我們的關鍵發現是，僅通過使用擴增提示自定義與語言相關的參數，能顯著提高參考相似性，同時保持源相似性。此外，我們提供了每個自定義和編輯過程的配方。我們比較了流行的自定義方法，並在各種數據集上驗證了我們的發現。

English

Text-to-image diffusion models can generate diverse, high-fidelity images based on user-provided text prompts. Recent research has extended these models to support text-guided image editing. While text guidance is an intuitive editing interface for users, it often fails to ensure the precise concept conveyed by users. To address this issue, we propose Custom-Edit, in which we (i) customize a diffusion model with a few reference images and then (ii) perform text-guided editing. Our key discovery is that customizing only language-relevant parameters with augmented prompts improves reference similarity significantly while maintaining source similarity. Moreover, we provide our recipe for each customization and editing process. We compare popular customization methods and validate our findings on two editing methods using various datasets.

自定義編輯：使用自定擴散模型進行文本引導的圖像編輯

Custom-Edit: Text-Guided Image Editing with Customized Diffusion Models

摘要

Support