ChatPaper.aiChatPaper

引导与重缩放:实现高效免调谐真实图像编辑的自引导机制

Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing

September 2, 2024
作者: Vadim Titov, Madina Khalmatova, Alexandra Ivanova, Dmitry Vetrov, Aibek Alanov
cs.AI

摘要

儘管大規模文本到圖像生成模型近期取得了進展,利用這些模型對真實圖像進行操控仍是一個具有挑戰性的問題。現有編輯方法的主要局限在於,它們要麼無法在廣泛的圖像編輯範圍內保持一致的質量,要麼需要耗時的超參數調整或對擴散模型進行微調,以保留輸入圖像的特定外觀。我們提出了一種新穎的方法,該方法基於通過引導機制修改的擴散採樣過程。在本研究中,我們探索了自引導技術,以保留輸入圖像的整體結構及其不應被編輯的局部區域外觀。特別是,我們明確引入了旨在保存源圖像局部和全局結構的佈局保持能量函數。此外,我們提出了一種噪聲重縮放機制,該機制通過在生成過程中平衡無分類器引導與我們提出的引導器的範數來保持噪聲分佈。這種引導方法無需對擴散模型進行微調和精確的反轉過程。因此,所提出的方法提供了一種快速且高質量的編輯機制。在我們的實驗中,通過人類評估和定量分析,我們展示了所提出的方法能夠產生更受人類青睞的期望編輯,並且在編輯質量與原始圖像保留之間實現了更好的平衡。我們的代碼可在https://github.com/FusionBrainLab/Guide-and-Rescale獲取。
English
Despite recent advances in large-scale text-to-image generative models, manipulating real images with these models remains a challenging problem. The main limitations of existing editing methods are that they either fail to perform with consistent quality on a wide range of image edits or require time-consuming hyperparameter tuning or fine-tuning of the diffusion model to preserve the image-specific appearance of the input image. We propose a novel approach that is built upon a modified diffusion sampling process via the guidance mechanism. In this work, we explore the self-guidance technique to preserve the overall structure of the input image and its local regions appearance that should not be edited. In particular, we explicitly introduce layout-preserving energy functions that are aimed to save local and global structures of the source image. Additionally, we propose a noise rescaling mechanism that allows to preserve noise distribution by balancing the norms of classifier-free guidance and our proposed guiders during generation. Such a guiding approach does not require fine-tuning the diffusion model and exact inversion process. As a result, the proposed method provides a fast and high-quality editing mechanism. In our experiments, we show through human evaluation and quantitative analysis that the proposed method allows to produce desired editing which is more preferable by humans and also achieves a better trade-off between editing quality and preservation of the original image. Our code is available at https://github.com/FusionBrainLab/Guide-and-Rescale.
PDF962November 14, 2024