DreamTuner: 단일 이미지로 충분한 주체 기반 생성

초록

디퓨전 기반 모델들은 텍스트-이미지 생성에서 인상적인 능력을 보여주었으며, 하나 또는 소수의 참조 이미지를 통해 맞춤형 개념을 생성해야 하는 주체 기반 생성의 개인화된 응용 분야에서 기대를 모으고 있습니다. 그러나 기존의 미세 조정 기반 방법들은 주체 학습과 사전 학습된 모델의 생성 능력 유지 사이의 균형을 맞추는 데 실패하고 있습니다. 또한, 추가적인 이미지 인코더를 활용하는 다른 방법들은 인코딩 압축으로 인해 주체의 중요한 세부 사항을 잃어버리는 경향이 있습니다. 이러한 문제를 해결하기 위해, 우리는 참조 정보를 거친 단계에서 세밀한 단계로 주입하여 주체 기반 이미지 생성을 더 효과적으로 달성하는 새로운 방법인 DreamTurner를 제안합니다. DreamTurner는 거친 주체 정체성 보존을 위한 주체 인코더를 도입하며, 압축된 일반 주체 특징들은 시각-텍스트 교차 주의 레이어 이전에 주의 레이어를 통해 도입됩니다. 그런 다음, 사전 학습된 텍스트-이미지 모델 내의 자기 주의 레이어를 자기-주체-주의 레이어로 수정하여 목표 주체의 세부 사항을 정제합니다. 생성된 이미지는 자기-주체-주의에서 참조 이미지와 자신으로부터 세부 특징을 쿼리합니다. 자기-주체-주의는 맞춤형 주체의 세부 특징을 유지하는 효과적이고 우아하며 학습이 필요 없는 방법으로, 추론 중에 플러그 앤 플레이 솔루션으로 사용될 수 있다는 점을 강조할 가치가 있습니다. 마지막으로, 추가적인 주체 기반 미세 조정을 통해 DreamTurner는 텍스트나 포즈와 같은 다른 조건으로 제어될 수 있는 주체 기반 이미지 생성에서 뛰어난 성능을 달성합니다. 더 자세한 내용은 프로젝트 페이지(https://dreamtuner-diffusion.github.io/)를 방문해 주세요.

English

Diffusion-based models have demonstrated impressive capabilities for text-to-image generation and are expected for personalized applications of subject-driven generation, which require the generation of customized concepts with one or a few reference images. However, existing methods based on fine-tuning fail to balance the trade-off between subject learning and the maintenance of the generation capabilities of pretrained models. Moreover, other methods that utilize additional image encoders tend to lose important details of the subject due to encoding compression. To address these challenges, we propose DreamTurner, a novel method that injects reference information from coarse to fine to achieve subject-driven image generation more effectively. DreamTurner introduces a subject-encoder for coarse subject identity preservation, where the compressed general subject features are introduced through an attention layer before visual-text cross-attention. We then modify the self-attention layers within pretrained text-to-image models to self-subject-attention layers to refine the details of the target subject. The generated image queries detailed features from both the reference image and itself in self-subject-attention. It is worth emphasizing that self-subject-attention is an effective, elegant, and training-free method for maintaining the detailed features of customized subjects and can serve as a plug-and-play solution during inference. Finally, with additional subject-driven fine-tuning, DreamTurner achieves remarkable performance in subject-driven image generation, which can be controlled by a text or other conditions such as pose. For further details, please visit the project page at https://dreamtuner-diffusion.github.io/.

DreamTuner: 단일 이미지로 충분한 주체 기반 생성

DreamTuner: Single Image is Enough for Subject-Driven Generation

초록

Support