基于统一潜在扩散模型的无调优图像编辑：兼顾保真度与可编辑性

摘要

在基于文本的图像编辑（TIE）中，平衡保真度与可编辑性至关重要，失败往往导致过度或不足的编辑问题。现有方法通常依赖注意力注入来保持结构，并利用预训练文本到图像（T2I）模型固有的文本对齐能力实现可编辑性，但它们缺乏明确且统一的机制来妥善平衡这两个目标。本文中，我们提出了UnifyEdit，一种无需调参的方法，通过扩散潜在优化在统一框架内实现保真度与可编辑性的平衡整合。与直接注意力注入不同，我们开发了两种基于注意力的约束：自注意力（SA）保持约束用于结构保真，以及交叉注意力（CA）对齐约束以增强文本对齐，提升可编辑性。然而，同时应用这两种约束可能导致梯度冲突，其中一种约束的主导会导致过度或不足的编辑。为解决这一挑战，我们引入了一种自适应时间步调度器，动态调整这些约束的影响，引导扩散潜在向最优平衡发展。大量定量与定性实验验证了我们方法的有效性，展示了其在多种编辑任务中实现结构保持与文本对齐之间稳健平衡的优越性，超越了其他最先进方法。源代码将发布于https://github.com/CUC-MIPG/UnifyEdit。

English

Balancing fidelity and editability is essential in text-based image editing (TIE), where failures commonly lead to over- or under-editing issues. Existing methods typically rely on attention injections for structure preservation and leverage the inherent text alignment capabilities of pre-trained text-to-image (T2I) models for editability, but they lack explicit and unified mechanisms to properly balance these two objectives. In this work, we introduce UnifyEdit, a tuning-free method that performs diffusion latent optimization to enable a balanced integration of fidelity and editability within a unified framework. Unlike direct attention injections, we develop two attention-based constraints: a self-attention (SA) preservation constraint for structural fidelity, and a cross-attention (CA) alignment constraint to enhance text alignment for improved editability. However, simultaneously applying both constraints can lead to gradient conflicts, where the dominance of one constraint results in over- or under-editing. To address this challenge, we introduce an adaptive time-step scheduler that dynamically adjusts the influence of these constraints, guiding the diffusion latent toward an optimal balance. Extensive quantitative and qualitative experiments validate the effectiveness of our approach, demonstrating its superiority in achieving a robust balance between structure preservation and text alignment across various editing tasks, outperforming other state-of-the-art methods. The source code will be available at https://github.com/CUC-MIPG/UnifyEdit.

基于统一潜在扩散模型的无调优图像编辑：兼顾保真度与可编辑性

Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model

摘要

Support