DiffEditor: 확산 기반 이미지 편집의 정확성과 유연성 향상

초록

대규모 텍스트-이미지(T2I) 확산 모델은 지난 몇 년 동안 이미지 생성 분야에 혁신을 가져왔습니다. 다양한 고품질 생성 능력을 갖추고 있음에도 불구하고, 이러한 능력을 세밀한 이미지 편집으로 전환하는 것은 여전히 어려운 과제로 남아 있습니다. 본 논문에서는 기존 확산 기반 이미지 편집의 두 가지 약점을 해결하기 위해 DiffEditor를 제안합니다: (1) 복잡한 시나리오에서 편집 결과가 종종 정확도를 잃거나 예상치 못한 아티팩트를 보이는 문제, (2) 새로운 콘텐츠를 상상하는 등 편집 작업을 조화롭게 조정하는 데 유연성이 부족한 문제. 우리의 솔루션에서는 세밀한 이미지 편집에서 이미지 프롬프트를 도입하여 텍스트 프롬프트와 협력하여 편집 내용을 더 잘 설명할 수 있도록 합니다. 콘텐츠 일관성을 유지하면서 유연성을 높이기 위해, 우리는 일반 미분 방정식(ODE) 샘플링에 확률적 미분 방정식(SDE)을 지역적으로 결합합니다. 또한, 확산 샘플링에 지역적 점수 기반 그래디언트 가이던스와 시간 여행 전략을 통합하여 편집 품질을 더욱 개선합니다. 광범위한 실험을 통해 우리의 방법이 단일 이미지 내 편집(예: 객체 이동, 크기 조정, 콘텐츠 드래깅) 및 이미지 간 편집(예: 외관 교체, 객체 붙여넣기)과 같은 다양한 세밀한 이미지 편작 작업에서 최첨단 성능을 효율적으로 달성할 수 있음을 입증합니다. 소스 코드는 https://github.com/MC-E/DragonDiffusion에서 공개되었습니다.

English

Large-scale Text-to-Image (T2I) diffusion models have revolutionized image generation over the last few years. Although owning diverse and high-quality generation capabilities, translating these abilities to fine-grained image editing remains challenging. In this paper, we propose DiffEditor to rectify two weaknesses in existing diffusion-based image editing: (1) in complex scenarios, editing results often lack editing accuracy and exhibit unexpected artifacts; (2) lack of flexibility to harmonize editing operations, e.g., imagine new content. In our solution, we introduce image prompts in fine-grained image editing, cooperating with the text prompt to better describe the editing content. To increase the flexibility while maintaining content consistency, we locally combine stochastic differential equation (SDE) into the ordinary differential equation (ODE) sampling. In addition, we incorporate regional score-based gradient guidance and a time travel strategy into the diffusion sampling, further improving the editing quality. Extensive experiments demonstrate that our method can efficiently achieve state-of-the-art performance on various fine-grained image editing tasks, including editing within a single image (e.g., object moving, resizing, and content dragging) and across images (e.g., appearance replacing and object pasting). Our source code is released at https://github.com/MC-E/DragonDiffusion.

DiffEditor: 확산 기반 이미지 편집의 정확성과 유연성 향상

DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing

초록

Support