FreeControl: 어떠한 조건에서도 텍스트-이미지 확산 모델의 공간적 제어를 학습 없이 가능하게 하는 방법

초록

ControlNet과 같은 최근 접근법들은 텍스트-이미지(T2I) 확산 모델에 대한 세밀한 공간적 제어를 사용자에게 제공합니다. 그러나 보조 모듈은 각 유형의 공간적 조건, 모델 아키텍처, 체크포인트마다 별도로 학습되어야 하며, 이는 인간 디자이너가 콘텐츠 생성 과정에서 AI 모델에 전달하고자 하는 다양한 의도와 선호도와 상충됩니다. 본 연구에서는 다중 조건, 아키텍처, 체크포인트를 동시에 지원하는 학습 없이 제어 가능한 T2I 생성을 위한 FreeControl을 제안합니다. FreeControl은 가이던스 이미지와의 구조 정렬을 용이하게 하는 구조 가이던스와 동일한 시드를 사용하여 생성된 이미지 간의 외관 공유를 가능하게 하는 외관 가이던스를 설계합니다. 다양한 사전 학습된 T2I 모델에 걸친 광범위한 정성적 및 정량적 실험을 통해 FreeControl의 우수한 성능을 입증합니다. 특히, FreeControl은 다양한 아키텍처와 체크포인트에 대한 편리한 학습 없는 제어를 가능하게 하며, 대부분의 기존 학습 없는 방법들이 실패하는 어려운 입력 조건을 허용하고, 학습 기반 접근법과 경쟁력 있는 합성 품질을 달성합니다.

English

Recent approaches such as ControlNet offer users fine-grained spatial control over text-to-image (T2I) diffusion models. However, auxiliary modules have to be trained for each type of spatial condition, model architecture, and checkpoint, putting them at odds with the diverse intents and preferences a human designer would like to convey to the AI models during the content creation process. In this work, we present FreeControl, a training-free approach for controllable T2I generation that supports multiple conditions, architectures, and checkpoints simultaneously. FreeControl designs structure guidance to facilitate the structure alignment with a guidance image, and appearance guidance to enable the appearance sharing between images generated using the same seed. Extensive qualitative and quantitative experiments demonstrate the superior performance of FreeControl across a variety of pre-trained T2I models. In particular, FreeControl facilitates convenient training-free control over many different architectures and checkpoints, allows the challenging input conditions on which most of the existing training-free methods fail, and achieves competitive synthesis quality with training-based approaches.

FreeControl: 어떠한 조건에서도 텍스트-이미지 확산 모델의 공간적 제어를 학습 없이 가능하게 하는 방법

FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition

초록

Support