超分辨率適配,輕鬆實現
Ultra-Resolution Adaptation with Ease
March 20, 2025
作者: Ruonan Yu, Songhua Liu, Zhenxiong Tan, Xinchao Wang
cs.AI
摘要
文本到圖像擴散模型近年來取得了顯著進展。然而,訓練用於高分辨率圖像生成的模型仍然具有挑戰性,尤其是在訓練數據和計算資源有限的情況下。本文從數據和參數效率這兩個關鍵角度探討了這一實際問題,並提出了一套稱為URAE的超分辨率適應關鍵指南。在數據效率方面,我們從理論和實踐上證明,由某些教師模型生成的合成數據能顯著促進訓練收斂。在參數效率方面,我們發現,當無法獲取合成數據時,調整權重矩陣的次要組件比廣泛使用的低秩適配器表現更優,在保持效率的同時提供了顯著的性能提升。此外,對於利用指導蒸餾的模型,如FLUX,我們展示了在適應期間禁用無分類器指導(即將指導尺度設置為1)對於獲得滿意性能至關重要。大量實驗驗證,URAE僅需3K樣本和2K迭代,就能在2K生成性能上與FLUX1.1 [Pro] Ultra等最先進的閉源模型相媲美,同時為4K分辨率生成設定了新的基準。代碼可於此處獲取:https://github.com/Huage001/URAE。
English
Text-to-image diffusion models have achieved remarkable progress in recent
years. However, training models for high-resolution image generation remains
challenging, particularly when training data and computational resources are
limited. In this paper, we explore this practical problem from two key
perspectives: data and parameter efficiency, and propose a set of key
guidelines for ultra-resolution adaptation termed URAE. For data
efficiency, we theoretically and empirically demonstrate that synthetic data
generated by some teacher models can significantly promote training
convergence. For parameter efficiency, we find that tuning minor components of
the weight matrices outperforms widely-used low-rank adapters when synthetic
data are unavailable, offering substantial performance gains while maintaining
efficiency. Additionally, for models leveraging guidance distillation, such as
FLUX, we show that disabling classifier-free guidance, i.e., setting
the guidance scale to 1 during adaptation, is crucial for satisfactory
performance. Extensive experiments validate that URAE achieves comparable
2K-generation performance to state-of-the-art closed-source models like FLUX1.1
[Pro] Ultra with only 3K samples and 2K iterations, while setting new
benchmarks for 4K-resolution generation. Codes are available
https://github.com/Huage001/URAE{here}.Summary
AI-Generated Summary