어떤 장면에서도 어디서나 무엇이든 생성하기

초록

텍스트-이미지 확산 모델은 다양한 분야에서의 광범위한 적용 가능성으로 인해 상당한 관심을 끌고 있습니다. 그러나 개인화된 객체 생성을 위한 제어 가능한 모델을 만드는 데는 여전히 과제가 남아 있습니다. 본 논문에서는 먼저 기존의 개인화 생성 모델에서 발생하는 엔트렌글먼트(entanglement) 문제를 식별하고, 확산 모델이 객체의 정체성에만 집중하도록 유도하는 간단하고 효율적인 데이터 증강 훈련 전략을 제안합니다. 사전 훈련된 제어 가능 확산 모델의 플러그 앤 플레이 어댑터 레이어를 삽입함으로써, 우리의 모델은 생성된 개인화된 객체의 위치와 크기를 제어할 수 있는 능력을 얻습니다. 추론 과정에서는 생성된 이미지의 품질과 충실도를 유지하기 위해 지역적으로 유도된 샘플링 기법을 제안합니다. 우리의 방법은 개인화된 객체에 대해 비교 가능하거나 우수한 충실도를 달성하며, 현실적이고 개인화된 이미지를 생성할 수 있는 강력하고 다용도로 사용 가능하며 제어 가능한 텍스트-이미지 확산 모델을 제공합니다. 우리의 접근 방식은 예술, 엔터테인먼트, 광고 디자인과 같은 다양한 응용 분야에서 상당한 잠재력을 보여줍니다.

English

Text-to-image diffusion models have attracted considerable interest due to their wide applicability across diverse fields. However, challenges persist in creating controllable models for personalized object generation. In this paper, we first identify the entanglement issues in existing personalized generative models, and then propose a straightforward and efficient data augmentation training strategy that guides the diffusion model to focus solely on object identity. By inserting the plug-and-play adapter layers from a pre-trained controllable diffusion model, our model obtains the ability to control the location and size of each generated personalized object. During inference, we propose a regionally-guided sampling technique to maintain the quality and fidelity of the generated images. Our method achieves comparable or superior fidelity for personalized objects, yielding a robust, versatile, and controllable text-to-image diffusion model that is capable of generating realistic and personalized images. Our approach demonstrates significant potential for various applications, such as those in art, entertainment, and advertising design.

어떤 장면에서도 어디서나 무엇이든 생성하기

Generate Anything Anywhere in Any Scene

초록

Support