InfGen:一种分辨率无关的可扩展图像生成范式
InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis
September 12, 2025
作者: Tao Han, Wanghan Xu, Junchao Gong, Xiaoyu Yue, Song Guo, Luping Zhou, Lei Bai
cs.AI
摘要
任意分辨率图像生成技术为跨设备提供了一致的视觉体验,在生产者与消费者中拥有广泛的应用前景。当前扩散模型的计算需求随分辨率呈二次方增长,导致4K图像生成耗时超过100秒。为解决这一问题,我们在潜在扩散模型的基础上探索第二代技术,将扩散模型生成的固定潜在空间视为内容表示,并提出利用一步式生成器从紧凑的潜在空间中解码任意分辨率图像。由此,我们引入了InfGen,用新型生成器替代VAE解码器,能够从固定大小的潜在空间生成任意分辨率的图像,而无需重新训练扩散模型。这一方法简化了流程,降低了计算复杂度,并适用于所有使用相同潜在空间的模型。实验表明,InfGen能够将众多模型提升至任意高分辨率时代,同时将4K图像生成时间缩短至10秒以内。
English
Arbitrary resolution image generation provides a consistent visual experience
across devices, having extensive applications for producers and consumers.
Current diffusion models increase computational demand quadratically with
resolution, causing 4K image generation delays over 100 seconds. To solve this,
we explore the second generation upon the latent diffusion models, where the
fixed latent generated by diffusion models is regarded as the content
representation and we propose to decode arbitrary resolution images with a
compact generated latent using a one-step generator. Thus, we present the
InfGen, replacing the VAE decoder with the new generator, for
generating images at any resolution from a fixed-size latent without retraining
the diffusion models, which simplifies the process, reducing computational
complexity and can be applied to any model using the same latent space.
Experiments show InfGen is capable of improving many models into the arbitrary
high-resolution era while cutting 4K image generation time to under 10 seconds.