超分辨率適配，輕鬆實現

摘要

文本到圖像擴散模型近年來取得了顯著進展。然而，訓練用於高分辨率圖像生成的模型仍然具有挑戰性，尤其是在訓練數據和計算資源有限的情況下。本文從數據和參數效率這兩個關鍵角度探討了這一實際問題，並提出了一套稱為URAE的超分辨率適應關鍵指南。在數據效率方面，我們從理論和實踐上證明，由某些教師模型生成的合成數據能顯著促進訓練收斂。在參數效率方面，我們發現，當無法獲取合成數據時，調整權重矩陣的次要組件比廣泛使用的低秩適配器表現更優，在保持效率的同時提供了顯著的性能提升。此外，對於利用指導蒸餾的模型，如FLUX，我們展示了在適應期間禁用無分類器指導（即將指導尺度設置為1）對於獲得滿意性能至關重要。大量實驗驗證，URAE僅需3K樣本和2K迭代，就能在2K生成性能上與FLUX1.1 [Pro] Ultra等最先進的閉源模型相媲美，同時為4K分辨率生成設定了新的基準。代碼可於此處獲取：https://github.com/Huage001/URAE。

English

Text-to-image diffusion models have achieved remarkable progress in recent years. However, training models for high-resolution image generation remains challenging, particularly when training data and computational resources are limited. In this paper, we explore this practical problem from two key perspectives: data and parameter efficiency, and propose a set of key guidelines for ultra-resolution adaptation termed URAE. For data efficiency, we theoretically and empirically demonstrate that synthetic data generated by some teacher models can significantly promote training convergence. For parameter efficiency, we find that tuning minor components of the weight matrices outperforms widely-used low-rank adapters when synthetic data are unavailable, offering substantial performance gains while maintaining efficiency. Additionally, for models leveraging guidance distillation, such as FLUX, we show that disabling classifier-free guidance, i.e., setting the guidance scale to 1 during adaptation, is crucial for satisfactory performance. Extensive experiments validate that URAE achieves comparable 2K-generation performance to state-of-the-art closed-source models like FLUX1.1 [Pro] Ultra with only 3K samples and 2K iterations, while setting new benchmarks for 4K-resolution generation. Codes are available https://github.com/Huage001/URAE{here}.

超分辨率適配，輕鬆實現

Ultra-Resolution Adaptation with Ease

摘要

Support