ChatPaper.aiChatPaper

I-Max:通过投影流最大化预训练矫正流Transformer的分辨率潜力。

I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow

October 10, 2024
作者: Ruoyi Du, Dongyang Liu, Le Zhuo, Qin Qi, Hongsheng Li, Zhanyu Ma, Peng Gao
cs.AI

摘要

修正流变压器(RFTs)提供了卓越的训练和推断效率,使它们很可能是扩展扩散模型的最可行方向。然而,由于数据质量和训练成本,生成分辨率的进展相对较慢。无调谐分辨率外推提供了一种替代方案,但当前方法往往会降低生成稳定性,限制了实际应用。本文回顾了现有的分辨率外推方法,并引入了I-Max框架,以最大化文本到图像RFTs的分辨率潜力。I-Max具有以下特点:(i)稳定外推的新型投影流策略和(ii)用于将模型知识泛化到更高分辨率的先进推断工具包。使用Lumina-Next-2K和Flux.1-dev进行的实验表明,I-Max能够增强分辨率外推的稳定性,并显示它可以带来图像细节的出现和伪影校正,从而确认了无调谐分辨率外推的实际价值。
English
Rectified Flow Transformers (RFTs) offer superior training and inference efficiency, making them likely the most viable direction for scaling up diffusion models. However, progress in generation resolution has been relatively slow due to data quality and training costs. Tuning-free resolution extrapolation presents an alternative, but current methods often reduce generative stability, limiting practical application. In this paper, we review existing resolution extrapolation methods and introduce the I-Max framework to maximize the resolution potential of Text-to-Image RFTs. I-Max features: (i) a novel Projected Flow strategy for stable extrapolation and (ii) an advanced inference toolkit for generalizing model knowledge to higher resolutions. Experiments with Lumina-Next-2K and Flux.1-dev demonstrate I-Max's ability to enhance stability in resolution extrapolation and show that it can bring image detail emergence and artifact correction, confirming the practical value of tuning-free resolution extrapolation.

Summary

AI-Generated Summary

PDF52November 16, 2024