LEDITS: DDPM 역변환과 의미적 지도를 활용한 실사 이미지 편집

초록

최근 대규모 텍스트 기반 확산 모델(Diffusion Model)은 강력한 이미지 생성 능력을 제공하고 있습니다. 현재, 이러한 이미지를 텍스트만으로 수정할 수 있도록 하여 직관적이고 다용도로 편집할 수 있도록 하는 데 상당한 노력이 기울여지고 있습니다. 그러나 생성 모델의 경우, 원본 이미지의 특정 콘텐츠를 보존해야 하는 편집 기술의 본질적인 특성 때문에 편집이 어려운 것으로 나타났습니다. 반면, 텍스트 기반 모델에서는 텍스트 프롬프트에 사소한 수정만 가해도 완전히 다른 결과가 나오는 경우가 많아, 사용자의 의도를 정확히 반영한 원샷(one-shot) 생성 결과를 얻는 것이 매우 어려운 실정입니다. 또한, 최신 도구를 사용하여 실제 이미지를 편집하려면 먼저 이미지를 사전 학습된 모델의 도메인으로 역변환(inversion)해야 하며, 이는 편집 품질과 지연 시간에 영향을 미치는 또 다른 요소로 작용합니다. 본 탐색적 보고서에서는 실제 이미지 편집을 위한 경량화된 통합 접근법인 LEDITS를 제안합니다. 이 방법은 Edit Friendly DDPM 역변환 기술과 의미적 지도(Semantic Guidance)를 결합하여, 의미적 지도를 실제 이미지 편집으로 확장함과 동시에 DDPM 역변환의 편집 기능을 활용합니다. 이 접근법은 아키텍처 확장이나 최적화 없이도 미세한 편집부터 대규모 편집, 그리고 구성 및 스타일 변경에 이르기까지 다양한 편집을 가능하게 합니다.

English

Recent large-scale text-guided diffusion models provide powerful image-generation capabilities. Currently, a significant effort is given to enable the modification of these images using text only as means to offer intuitive and versatile editing. However, editing proves to be difficult for these generative models due to the inherent nature of editing techniques, which involves preserving certain content from the original image. Conversely, in text-based models, even minor modifications to the text prompt frequently result in an entirely distinct result, making attaining one-shot generation that accurately corresponds to the users intent exceedingly challenging. In addition, to edit a real image using these state-of-the-art tools, one must first invert the image into the pre-trained models domain - adding another factor affecting the edit quality, as well as latency. In this exploratory report, we propose LEDITS - a combined lightweight approach for real-image editing, incorporating the Edit Friendly DDPM inversion technique with Semantic Guidance, thus extending Semantic Guidance to real image editing, while harnessing the editing capabilities of DDPM inversion as well. This approach achieves versatile edits, both subtle and extensive as well as alterations in composition and style, while requiring no optimization nor extensions to the architecture.

LEDITS: DDPM 역변환과 의미적 지도를 활용한 실사 이미지 편집

LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance

초록

Support