超解像適応を容易に実現

要旨

テキストから画像への拡散モデルは近年、目覚ましい進歩を遂げています。しかし、高解像度画像生成のためのモデル訓練は、特に訓練データと計算資源が限られている場合、依然として困難な課題です。本論文では、この実践的な問題をデータ効率性とパラメータ効率性という2つの主要な視点から探求し、超解像度適応のための一連の重要なガイドラインであるURAEを提案します。データ効率性に関しては、いくつかの教師モデルによって生成された合成データが訓練の収束を大幅に促進することを理論的かつ実証的に示します。パラメータ効率性に関しては、合成データが利用できない場合、重み行列の小さなコンポーネントを調整することが広く使用されている低ランクアダプタを上回り、効率を維持しながら大幅な性能向上をもたらすことを発見しました。さらに、FLUXのようなガイダンス蒸留を活用するモデルにおいて、分類器なしガイダンスを無効化すること、つまり適応中にガイダンススケールを1に設定することが満足のいく性能を得るために重要であることを示します。大規模な実験により、URAEがわずか3,000サンプルと2,000イテレーションで、FLUX1.1 [Pro] Ultraのような最先端のクローズドソースモデルに匹敵する2K生成性能を達成し、4K解像度生成において新たなベンチマークを設定することが検証されました。コードはhttps://github.com/Huage001/URAEで公開されています。

English

Text-to-image diffusion models have achieved remarkable progress in recent years. However, training models for high-resolution image generation remains challenging, particularly when training data and computational resources are limited. In this paper, we explore this practical problem from two key perspectives: data and parameter efficiency, and propose a set of key guidelines for ultra-resolution adaptation termed URAE. For data efficiency, we theoretically and empirically demonstrate that synthetic data generated by some teacher models can significantly promote training convergence. For parameter efficiency, we find that tuning minor components of the weight matrices outperforms widely-used low-rank adapters when synthetic data are unavailable, offering substantial performance gains while maintaining efficiency. Additionally, for models leveraging guidance distillation, such as FLUX, we show that disabling classifier-free guidance, i.e., setting the guidance scale to 1 during adaptation, is crucial for satisfactory performance. Extensive experiments validate that URAE achieves comparable 2K-generation performance to state-of-the-art closed-source models like FLUX1.1 [Pro] Ultra with only 3K samples and 2K iterations, while setting new benchmarks for 4K-resolution generation. Codes are available https://github.com/Huage001/URAE{here}.

超解像適応を容易に実現

Ultra-Resolution Adaptation with Ease

要旨

Support