EditCrafter：基于预训练扩散模型的免调参高分辨率图像编辑

摘要

我们提出EditCrafter——一种无需调优的高分辨率图像编辑方法，该方法利用预训练的文本到图像扩散模型，能够处理远超训练时分辨率的图像。借助大规模文生图扩散模型的生成先验，可开发出多种新颖的生成与编辑应用。尽管现有基于扩散模型的图像编辑方法已能呈现高质量效果，但由于仅能在训练分辨率（512x512或1024x1024）下工作，难以适配任意宽高比或更高分辨率的图像。简单地采用分块编辑会导致物体结构失真和内容重复。为解决这些难题，我们引入了EditCrafter这一简洁高效的编辑流程。该方法首先通过分块反演技术保留输入高分辨率图像的原始特征，进而提出专为高分辨率图像编辑设计的噪声阻尼流形约束无分类器引导技术，从反演后的潜空间进行编辑。实验表明，EditCrafter无需微调与优化即可在不同分辨率下实现令人印象深刻的编辑效果。

English

We propose EditCrafter, a high-resolution image editing method that operates without tuning, leveraging pretrained text-to-image (T2I) diffusion models to process images at resolutions significantly exceeding those used during training. Leveraging the generative priors of large-scale T2I diffusion models enables the development of a wide array of novel generation and editing applications. Although numerous image editing methods have been proposed based on diffusion models and exhibit high-quality editing results, they are difficult to apply to images with arbitrary aspect ratios or higher resolutions since they only work at the training resolutions (512x512 or 1024x1024). Naively applying patch-wise editing fails with unrealistic object structures and repetition. To address these challenges, we introduce EditCrafter, a simple yet effective editing pipeline. EditCrafter operates by first performing tiled inversion, which preserves the original identity of the input high-resolution image. We further propose a noise-damped manifold-constrained classifier-free guidance (NDCFG++) that is tailored for high resolution image editing from the inverted latent. Our experiments show that the our EditCrafter can achieve impressive editing results across various resolutions without fine-tuning and optimization.

EditCrafter：基于预训练扩散模型的免调参高分辨率图像编辑

EditCrafter: Tuning-free High-Resolution Image Editing via Pretrained Diffusion Model

摘要

Support