ChatPaper.aiChatPaper

無需圖像編輯對的圖像編輯模型學習

Learning an Image Editing Model without Image Editing Pairs

October 16, 2025
作者: Nupur Kumari, Sheng-Yu Wang, Nanxuan Zhao, Yotam Nitzan, Yuheng Li, Krishna Kumar Singh, Richard Zhang, Eli Shechtman, Jun-Yan Zhu, Xun Huang
cs.AI

摘要

近期,圖像編輯模型在遵循自然語言編輯指令方面取得了顯著成果,但這些模型依賴於大規模輸入-目標對數據集的有監督微調。這成為了一個關鍵瓶頸,因為此類自然生成的配對數據難以大規模策劃。目前的解決方案利用現有模型的零樣本能力生成合成訓練對,但這可能將預訓練模型的瑕疵傳播並放大至最終訓練模型中。本研究提出了一種全新的訓練範式,徹底消除了對配對數據的需求。我們的方法通過在訓練過程中展開多步擴散模型,並利用視覺-語言模型(VLMs)的反饋,直接優化模型。對於每個輸入和編輯指令,VLM評估編輯是否遵循指令並保留未變更的內容,從而為端到端優化提供直接梯度。為了確保視覺保真度,我們引入了分佈匹配損失(DMD),約束生成圖像保持在預訓練模型學習到的圖像流形內。我們在標準基準上評估了我們的方法,並進行了廣泛的消融研究。在無需任何配對數據的情況下,我們的方法在少步設置下,與基於大量有監督配對數據訓練的各種圖像編輯擴散模型表現相當。在採用相同VLM作為獎勵模型的情況下,我們也超越了基於強化學習的技術,如Flow-GRPO。
English
Recent image editing models have achieved impressive results while following natural language editing instructions, but they rely on supervised fine-tuning with large datasets of input-target pairs. This is a critical bottleneck, as such naturally occurring pairs are hard to curate at scale. Current workarounds use synthetic training pairs that leverage the zero-shot capabilities of existing models. However, this can propagate and magnify the artifacts of the pretrained model into the final trained model. In this work, we present a new training paradigm that eliminates the need for paired data entirely. Our approach directly optimizes a few-step diffusion model by unrolling it during training and leveraging feedback from vision-language models (VLMs). For each input and editing instruction, the VLM evaluates if an edit follows the instruction and preserves unchanged content, providing direct gradients for end-to-end optimization. To ensure visual fidelity, we incorporate distribution matching loss (DMD), which constrains generated images to remain within the image manifold learned by pretrained models. We evaluate our method on standard benchmarks and include an extensive ablation study. Without any paired data, our method performs on par with various image editing diffusion models trained on extensive supervised paired data, under the few-step setting. Given the same VLM as the reward model, we also outperform RL-based techniques like Flow-GRPO.
PDF62October 17, 2025