단일 이미지의 연속적 레이아웃 편집을 위한 확산 모델

초록

대규모 텍스트-이미지 확산 모델의 최근 발전은 이미지 편집 분야에서 다양한 응용을 가능하게 했습니다. 그러나 기존의 방법들은 단일 이미지의 레이아웃을 편집하는 데는 한계가 있었습니다. 이러한 격차를 해결하기 위해, 우리는 단일 이미지의 시각적 속성을 보존하면서 레이아웃을 편집할 수 있는 최초의 프레임워크를 제안합니다. 이를 통해 단일 이미지에 대한 지속적인 편집이 가능해집니다. 우리의 접근 방식은 두 가지 핵심 모듈을 통해 구현됩니다. 먼저, 이미지 내 여러 객체의 특성을 보존하기 위해, 우리는 '마스크된 텍스트 인버전'이라는 새로운 방법을 사용하여 서로 다른 객체의 개념을 분리하고 이를 별도의 텍스트 토큰으로 임베딩합니다. 다음으로, 사전 학습된 확산 모델에 대한 레이아웃 제어를 수행하기 위해 학습이 필요 없는 최적화 방법을 제안합니다. 이를 통해 학습된 개념을 기반으로 이미지를 재생성하고 사용자가 지정한 레이아웃에 맞출 수 있습니다. 기존 이미지의 레이아웃을 편집하는 최초의 프레임워크로서, 우리의 방법이 효과적이며 이 작업을 지원하기 위해 수정된 다른 베이스라인들을 능가함을 입증합니다. 우리의 코드는 논문 채택 시 공개적으로 자유롭게 사용할 수 있도록 제공될 예정입니다.

English

Recent advancements in large-scale text-to-image diffusion models have enabled many applications in image editing. However, none of these methods have been able to edit the layout of single existing images. To address this gap, we propose the first framework for layout editing of a single image while preserving its visual properties, thus allowing for continuous editing on a single image. Our approach is achieved through two key modules. First, to preserve the characteristics of multiple objects within an image, we disentangle the concepts of different objects and embed them into separate textual tokens using a novel method called masked textual inversion. Next, we propose a training-free optimization method to perform layout control for a pre-trained diffusion model, which allows us to regenerate images with learned concepts and align them with user-specified layouts. As the first framework to edit the layout of existing images, we demonstrate that our method is effective and outperforms other baselines that were modified to support this task. Our code will be freely available for public use upon acceptance.

단일 이미지의 연속적 레이아웃 편집을 위한 확산 모델

Continuous Layout Editing of Single Images with Diffusion Models

초록

Support