I-Max:利用投影流最大化預訓練矯正流Transformer的解析潛力
I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow
October 10, 2024
作者: Ruoyi Du, Dongyang Liu, Le Zhuo, Qin Qi, Hongsheng Li, Zhanyu Ma, Peng Gao
cs.AI
摘要
糾正流轉換器(RFTs)提供卓越的訓練和推論效率,使其成為擴展擴散模型最可行的方向。然而,由於數據質量和訓練成本的問題,生成解析度的進展相對緩慢。無調整解析度外推提供了一種替代方案,但目前的方法常常會降低生成穩定性,限制了實際應用。本文回顧了現有的解析度外推方法,並引入了 I-Max 框架,以最大化文本到圖像 RFTs 的解析度潛力。I-Max 具有以下特點:(i)一種新穎的投影流策略,用於穩定外推,以及(ii)一個先進的推論工具包,用於將模型知識泛化到更高的解析度。通過 Lumina-Next-2K 和 Flux.1-dev 的實驗,證明了 I-Max 在解析度外推中增強穩定性的能力,並顯示它能帶來圖像細節的出現和瑕疵修正,確認了無調整解析度外推的實際價值。
English
Rectified Flow Transformers (RFTs) offer superior training and inference
efficiency, making them likely the most viable direction for scaling up
diffusion models. However, progress in generation resolution has been
relatively slow due to data quality and training costs. Tuning-free resolution
extrapolation presents an alternative, but current methods often reduce
generative stability, limiting practical application. In this paper, we review
existing resolution extrapolation methods and introduce the I-Max framework to
maximize the resolution potential of Text-to-Image RFTs. I-Max features: (i) a
novel Projected Flow strategy for stable extrapolation and (ii) an advanced
inference toolkit for generalizing model knowledge to higher resolutions.
Experiments with Lumina-Next-2K and Flux.1-dev demonstrate I-Max's ability to
enhance stability in resolution extrapolation and show that it can bring image
detail emergence and artifact correction, confirming the practical value of
tuning-free resolution extrapolation.Summary
AI-Generated Summary