TextBoost：通过微调文本编码器实现文本到图像模型的一次性个性化

摘要

最近在文本到图像模型方面取得的突破性进展为个性化图像生成开辟了前景广阔的研究途径，使用户能够利用自然语言提示创建特定主题的多样化图像。然而，现有方法在仅提供单个参考图像时往往会出现性能下降的问题。它们往往会过度拟合输入，产生高度相似的输出，而不考虑文本提示。本文解决了一次性个性化的挑战，通过减轻过度拟合，实现了通过文本提示创建可控图像。具体而言，我们提出了一种专注于文本编码器的选择性微调策略。此外，我们引入了三种关键技术来提高个性化性能：(1) 增强标记以促进特征解耦和减轻过度拟合，(2) 知识保留损失以减少语言漂移并促进在不同提示之间的泛化能力，以及 (3) 信噪比加权采样以进行高效训练。大量实验证明，我们的方法能够高效生成高质量、多样化的图像，仅使用单个参考图像，同时显著减少内存和存储需求。

English

Recent breakthroughs in text-to-image models have opened up promising research avenues in personalized image generation, enabling users to create diverse images of a specific subject using natural language prompts. However, existing methods often suffer from performance degradation when given only a single reference image. They tend to overfit the input, producing highly similar outputs regardless of the text prompt. This paper addresses the challenge of one-shot personalization by mitigating overfitting, enabling the creation of controllable images through text prompts. Specifically, we propose a selective fine-tuning strategy that focuses on the text encoder. Furthermore, we introduce three key techniques to enhance personalization performance: (1) augmentation tokens to encourage feature disentanglement and alleviate overfitting, (2) a knowledge-preservation loss to reduce language drift and promote generalizability across diverse prompts, and (3) SNR-weighted sampling for efficient training. Extensive experiments demonstrate that our approach efficiently generates high-quality, diverse images using only a single reference image while significantly reducing memory and storage requirements.

TextBoost：通过微调文本编码器实现文本到图像模型的一次性个性化

TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder

摘要

Support