DreamTuner：單張圖像足以進行主題驅動生成

摘要

基於擴散的模型展示了在文本到圖像生成方面令人印象深刻的能力，並且被期望用於主題驅動生成的個性化應用，這些應用需要根據一個或少數參考圖像生成定制概念。然而，現有基於微調的方法未能平衡主題學習和預訓練模型生成能力之間的折衷。此外，其他利用額外圖像編碼器的方法往往由於編碼壓縮而丟失主題的重要細節。為應對這些挑戰，我們提出了DreamTurner，一種新穎的方法，通過從粗到細注入參考信息，更有效地實現主題驅動的圖像生成。DreamTurner引入了一個主題編碼器，用於粗略主題身份保存，通過一個注意力層將壓縮的一般主題特徵引入視覺-文本交叉注意力之前。然後，我們修改了預訓練文本到圖像模型中的自注意力層，使其成為自主題注意力層，以精細調整目標主題的細節。生成的圖像在自主題注意力中從參考圖像和自身中查詢詳細特徵。值得強調的是，自主題注意力是一種有效、優雅且無需訓練的方法，用於保持定制主題的詳細特徵，並可在推論過程中作為即插即用的解決方案。最後，通過額外的主題驅動微調，DreamTurner在主題驅動的圖像生成方面實現了卓越的性能，可以由文本或其他條件（如姿勢）控制。欲了解更多詳情，請訪問項目頁面https://dreamtuner-diffusion.github.io/。

English

Diffusion-based models have demonstrated impressive capabilities for text-to-image generation and are expected for personalized applications of subject-driven generation, which require the generation of customized concepts with one or a few reference images. However, existing methods based on fine-tuning fail to balance the trade-off between subject learning and the maintenance of the generation capabilities of pretrained models. Moreover, other methods that utilize additional image encoders tend to lose important details of the subject due to encoding compression. To address these challenges, we propose DreamTurner, a novel method that injects reference information from coarse to fine to achieve subject-driven image generation more effectively. DreamTurner introduces a subject-encoder for coarse subject identity preservation, where the compressed general subject features are introduced through an attention layer before visual-text cross-attention. We then modify the self-attention layers within pretrained text-to-image models to self-subject-attention layers to refine the details of the target subject. The generated image queries detailed features from both the reference image and itself in self-subject-attention. It is worth emphasizing that self-subject-attention is an effective, elegant, and training-free method for maintaining the detailed features of customized subjects and can serve as a plug-and-play solution during inference. Finally, with additional subject-driven fine-tuning, DreamTurner achieves remarkable performance in subject-driven image generation, which can be controlled by a text or other conditions such as pose. For further details, please visit the project page at https://dreamtuner-diffusion.github.io/.

DreamTuner：單張圖像足以進行主題驅動生成

DreamTuner: Single Image is Enough for Subject-Driven Generation

摘要

Support