제어 가능한 이미지 생성을 위한 확산 자기-가이던스

초록

대규모 생성 모델은 상세한 텍스트 설명으로부터 고품질의 이미지를 생성할 수 있습니다. 그러나 이미지의 많은 측면은 텍스트를 통해 전달하기 어렵거나 불가능합니다. 우리는 확산 모델의 내부 표현을 안내함으로써 생성된 이미지에 대한 더 큰 제어를 제공하는 자기-안내(self-guidance) 방법을 소개합니다. 우리는 객체의 형태, 위치, 외관과 같은 속성이 이러한 표현에서 추출되어 샘플링을 조종하는 데 사용될 수 있음을 보여줍니다. 자기-안내는 분류기 안내(classifier guidance)와 유사하게 작동하지만, 사전 훈련된 모델 자체에 존재하는 신호를 사용하며 추가 모델이나 훈련이 필요하지 않습니다. 우리는 간단한 속성 집합이 객체의 위치나 크기를 수정하거나, 한 이미지의 객체 외관을 다른 이미지의 레이아웃과 결합하거나, 여러 이미지의 객체를 하나로 구성하는 것과 같은 도전적인 이미지 조작을 수행하는 데 어떻게 활용될 수 있는지 보여줍니다. 또한 자기-안내가 실제 이미지를 편집하는 데 사용될 수 있음을 보여줍니다. 결과와 인터랙티브 데모는 프로젝트 페이지(https://dave.ml/selfguidance/)에서 확인할 수 있습니다.

English

Large-scale generative models are capable of producing high-quality images from detailed text descriptions. However, many aspects of an image are difficult or impossible to convey through text. We introduce self-guidance, a method that provides greater control over generated images by guiding the internal representations of diffusion models. We demonstrate that properties such as the shape, location, and appearance of objects can be extracted from these representations and used to steer sampling. Self-guidance works similarly to classifier guidance, but uses signals present in the pretrained model itself, requiring no additional models or training. We show how a simple set of properties can be composed to perform challenging image manipulations, such as modifying the position or size of objects, merging the appearance of objects in one image with the layout of another, composing objects from many images into one, and more. We also show that self-guidance can be used to edit real images. For results and an interactive demo, see our project page at https://dave.ml/selfguidance/

제어 가능한 이미지 생성을 위한 확산 자기-가이던스

Diffusion Self-Guidance for Controllable Image Generation

초록

Support