TweedieMix: 확산 기반 이미지/비디오 생성을 위한 다중 개념 퓨전 개선

초록

텍스트에서 이미지 및 비디오를 맞춤화하는 모델이 크게 발전했음에도, 여러 맞춤화된 개념을 효과적으로 통합하는 이미지와 비디오를 생성하는 것은 여전히 어려운 과제입니다. 이를 해결하기 위해 저희는 추론 단계에서 맞춤화된 확산 모델을 구성하는 새로운 방법인 TweedieMix를 제안합니다. 역확산 샘플링의 특성을 분석함으로써, 저희의 접근 방식은 샘플링 프로세스를 두 단계로 나눕니다. 초기 단계에서는 원하는 대상 객체를 포함하기 위해 다중 객체 인식 샘플링 기술을 적용합니다. 나중 단계에서는 Tweedie의 공식을 사용하여 이미지 공간에서 맞춤 개념의 외관을 혼합합니다. 저희 결과는 TweedieMix가 기존 방법보다 높은 충실도로 여러 맞춤화된 개념을 생성할 수 있음을 보여줍니다. 더불어, 저희의 프레임워크는 이미지에서 비디오로의 확산 모델로 쉽게 확장될 수 있어, 여러 맞춤화된 개념을 특징으로 하는 비디오를 생성할 수 있습니다. 결과와 소스 코드는 저희 익명의 프로젝트 페이지에 있습니다.

English

Despite significant advancements in customizing text-to-image and video generation models, generating images and videos that effectively integrate multiple personalized concepts remains a challenging task. To address this, we present TweedieMix, a novel method for composing customized diffusion models during the inference phase. By analyzing the properties of reverse diffusion sampling, our approach divides the sampling process into two stages. During the initial steps, we apply a multiple object-aware sampling technique to ensure the inclusion of the desired target objects. In the later steps, we blend the appearances of the custom concepts in the de-noised image space using Tweedie's formula. Our results demonstrate that TweedieMix can generate multiple personalized concepts with higher fidelity than existing methods. Moreover, our framework can be effortlessly extended to image-to-video diffusion models, enabling the generation of videos that feature multiple personalized concepts. Results and source code are in our anonymous project page.

TweedieMix: 확산 기반 이미지/비디오 생성을 위한 다중 개념 퓨전 개선

TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation

초록

Support