ChatPaper.aiChatPaper

超分辨率适配轻松实现

Ultra-Resolution Adaptation with Ease

March 20, 2025
作者: Ruonan Yu, Songhua Liu, Zhenxiong Tan, Xinchao Wang
cs.AI

摘要

近年来,文本到图像的扩散模型取得了显著进展。然而,训练用于生成高分辨率图像的模型仍然面临挑战,尤其是在训练数据和计算资源有限的情况下。本文从数据和参数效率这两个关键视角探讨了这一实际问题,并提出了一套超分辨率适应的核心准则,称为URAE。在数据效率方面,我们通过理论和实验证明,某些教师模型生成的合成数据能显著促进训练收敛。在参数效率方面,我们发现,当缺乏合成数据时,微调权重矩阵的次要组件比广泛使用的低秩适配器表现更优,在保持效率的同时带来了显著的性能提升。此外,对于利用指导蒸馏的模型,如FLUX,我们表明在适应过程中禁用无分类器指导(即将指导尺度设为1)对于获得满意的性能至关重要。大量实验验证,URAE仅用3K样本和2K迭代就实现了与FLUX1.1 [Pro] Ultra等最先进闭源模型相当的2K生成性能,同时为4K分辨率生成设立了新的基准。代码可在https://github.com/Huage001/URAE获取。
English
Text-to-image diffusion models have achieved remarkable progress in recent years. However, training models for high-resolution image generation remains challenging, particularly when training data and computational resources are limited. In this paper, we explore this practical problem from two key perspectives: data and parameter efficiency, and propose a set of key guidelines for ultra-resolution adaptation termed URAE. For data efficiency, we theoretically and empirically demonstrate that synthetic data generated by some teacher models can significantly promote training convergence. For parameter efficiency, we find that tuning minor components of the weight matrices outperforms widely-used low-rank adapters when synthetic data are unavailable, offering substantial performance gains while maintaining efficiency. Additionally, for models leveraging guidance distillation, such as FLUX, we show that disabling classifier-free guidance, i.e., setting the guidance scale to 1 during adaptation, is crucial for satisfactory performance. Extensive experiments validate that URAE achieves comparable 2K-generation performance to state-of-the-art closed-source models like FLUX1.1 [Pro] Ultra with only 3K samples and 2K iterations, while setting new benchmarks for 4K-resolution generation. Codes are available https://github.com/Huage001/URAE{here}.

Summary

AI-Generated Summary

PDF132March 21, 2025