增強定制文本到圖像生成的細節保留：一種無正則化方法

摘要

最近的文本到圖像生成模型展示了生成與文本對齊的高保真度圖像的令人印象深刻能力。然而，生成用戶輸入圖像提供的新概念的圖像仍然是一項具有挑戰性的任務。為了解決這個問題，研究人員一直在探索各種方法來定制預訓練的文本到圖像生成模型。目前，大多數現有的用於定制預訓練文本到圖像生成模型的方法涉及使用正則化技術來防止過度擬合。雖然正則化將緩解定制的挑戰並且在文本引導方面帶來成功的內容創作，但它可能會限制模型的能力，導致詳細信息的丟失和性能下降。在這項工作中，我們提出了一個新的框架，用於定制文本到圖像生成，而無需使用正則化。具體來說，我們提出的框架包括一個編碼器網絡和一種新的採樣方法，可以應對過度擬合問題而無需使用正則化。通過我們提出的框架，我們能夠在單個GPU上的半分鐘內定制一個大規模的文本到圖像生成模型，只需用戶提供一張圖像。我們在實驗中展示，我們提出的框架優於現有方法，並保留了更多的細節信息。

English

Recent text-to-image generation models have demonstrated impressive capability of generating text-aligned images with high fidelity. However, generating images of novel concept provided by the user input image is still a challenging task. To address this problem, researchers have been exploring various methods for customizing pre-trained text-to-image generation models. Currently, most existing methods for customizing pre-trained text-to-image generation models involve the use of regularization techniques to prevent over-fitting. While regularization will ease the challenge of customization and leads to successful content creation with respect to text guidance, it may restrict the model capability, resulting in the loss of detailed information and inferior performance. In this work, we propose a novel framework for customized text-to-image generation without the use of regularization. Specifically, our proposed framework consists of an encoder network and a novel sampling method which can tackle the over-fitting problem without the use of regularization. With the proposed framework, we are able to customize a large-scale text-to-image generation model within half a minute on single GPU, with only one image provided by the user. We demonstrate in experiments that our proposed framework outperforms existing methods, and preserves more fine-grained details.

增強定制文本到圖像生成的細節保留：一種無正則化方法

Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach

摘要

Support