增强定制文本到图像生成的细节保留：一种无正则化方法

摘要

最近的文本到图像生成模型展示了生成与文本对齐的高保真图像的令人印象深刻的能力。然而，生成用户输入图像提供的新概念的图像仍然是一项具有挑战性的任务。为了解决这一问题，研究人员一直在探索各种方法来定制预训练的文本到图像生成模型。目前，大多数现有的定制预训练文本到图像生成模型的方法涉及使用正则化技术来防止过拟合。虽然正则化可以减轻定制化的挑战并实现成功地根据文本指导进行内容创作，但它可能会限制模型的能力，导致详细信息的丢失和性能下降。在这项工作中，我们提出了一种新颖的框架，用于定制文本到图像生成，而无需使用正则化。具体来说，我们提出的框架包括一个编码器网络和一种新颖的采样方法，可以解决过拟合问题，而无需使用正则化。通过我们提出的框架，我们能够在单个GPU上的半分钟内定制一个大规模的文本到图像生成模型，仅需用户提供一张图像。我们在实验中展示，我们提出的框架优于现有方法，并保留了更多的细节信息。

English

Recent text-to-image generation models have demonstrated impressive capability of generating text-aligned images with high fidelity. However, generating images of novel concept provided by the user input image is still a challenging task. To address this problem, researchers have been exploring various methods for customizing pre-trained text-to-image generation models. Currently, most existing methods for customizing pre-trained text-to-image generation models involve the use of regularization techniques to prevent over-fitting. While regularization will ease the challenge of customization and leads to successful content creation with respect to text guidance, it may restrict the model capability, resulting in the loss of detailed information and inferior performance. In this work, we propose a novel framework for customized text-to-image generation without the use of regularization. Specifically, our proposed framework consists of an encoder network and a novel sampling method which can tackle the over-fitting problem without the use of regularization. With the proposed framework, we are able to customize a large-scale text-to-image generation model within half a minute on single GPU, with only one image provided by the user. We demonstrate in experiments that our proposed framework outperforms existing methods, and preserves more fine-grained details.

增强定制文本到图像生成的细节保留：一种无正则化方法

Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach

摘要

Support