DragDiffusion: 인터랙티브 포인트 기반 이미지 편집을 위한 확산 모델 활용

초록

정밀하고 제어 가능한 이미지 편집은 상당한 관심을 끌고 있는 도전적인 과제입니다. 최근 DragGAN은 인터랙티브 포인트 기반 이미지 편집 프레임워크를 가능하게 하여 픽셀 수준의 정밀도로 인상적인 편집 결과를 달성했습니다. 그러나 이 방법은 생성적 적대 신경망(GAN)에 기반하고 있기 때문에, 그 일반성은 사전 훈련된 GAN 모델의 용량에 의해 상한이 정해집니다. 본 연구에서는 이러한 편집 프레임워크를 확산 모델로 확장하고 DragDiffusion을 제안합니다. 대규모 사전 훈련된 확산 모델을 활용함으로써, 우리는 실제 시나리오에서 인터랙티브 포인트 기반 편집의 적용 가능성을 크게 향상시켰습니다. 기존의 대부분의 확산 기반 이미지 편집 방법들이 텍스트 임베딩에 작동하는 반면, DragDiffusion은 정확한 공간 제어를 달성하기 위해 확산 잠재 공간을 최적화합니다. 확산 모델이 반복적인 방식으로 이미지를 생성하지만, 우리는 실험적으로 단일 단계에서 확산 잠재 공간을 최적화하는 것만으로도 일관된 결과를 생성할 수 있음을 보여주어, DragDiffusion이 고품질 편집을 효율적으로 완료할 수 있게 합니다. 다양한 도전적인 사례(예: 다중 객체, 다양한 객체 카테고리, 다양한 스타일 등)에 걸친 광범위한 실험을 통해 DragDiffusion의 다재다능함과 일반성을 입증했습니다.

English

Precise and controllable image editing is a challenging task that has attracted significant attention. Recently, DragGAN enables an interactive point-based image editing framework and achieves impressive editing results with pixel-level precision. However, since this method is based on generative adversarial networks (GAN), its generality is upper-bounded by the capacity of the pre-trained GAN models. In this work, we extend such an editing framework to diffusion models and propose DragDiffusion. By leveraging large-scale pretrained diffusion models, we greatly improve the applicability of interactive point-based editing in real world scenarios. While most existing diffusion-based image editing methods work on text embeddings, DragDiffusion optimizes the diffusion latent to achieve precise spatial control. Although diffusion models generate images in an iterative manner, we empirically show that optimizing diffusion latent at one single step suffices to generate coherent results, enabling DragDiffusion to complete high-quality editing efficiently. Extensive experiments across a wide range of challenging cases (e.g., multi-objects, diverse object categories, various styles, etc.) demonstrate the versatility and generality of DragDiffusion.

DragDiffusion: 인터랙티브 포인트 기반 이미지 편집을 위한 확산 모델 활용

DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing

초록

Support