ChatPaper.aiChatPaper

InfGen:一種分辨率無關的圖像合成可擴展範式

InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis

September 12, 2025
作者: Tao Han, Wanghan Xu, Junchao Gong, Xiaoyu Yue, Song Guo, Luping Zhou, Lei Bai
cs.AI

摘要

任意分辨率圖像生成技術為不同設備間提供了一致的視覺體驗,對生產者與消費者均具有廣泛應用價值。當前擴散模型在分辨率提升時,其計算需求呈二次方增長,導致生成4K圖像的延遲超過100秒。為解決此問題,我們探索了基於潛在擴散模型的第二代技術,將擴散模型生成的固定潛在變量視為內容表示,並提出利用一步生成器從緊湊生成的潛在變量中解碼出任意分辨率的圖像。據此,我們提出了InfGen,以新型生成器替代VAE解碼器,實現了從固定大小潛在變量生成任意分辨率圖像,無需重新訓練擴散模型,簡化了流程,降低了計算複雜度,並可應用於使用相同潛在空間的任何模型。實驗表明,InfGen能夠將多種模型提升至任意高分辨率時代,同時將4K圖像生成時間縮短至10秒以內。
English
Arbitrary resolution image generation provides a consistent visual experience across devices, having extensive applications for producers and consumers. Current diffusion models increase computational demand quadratically with resolution, causing 4K image generation delays over 100 seconds. To solve this, we explore the second generation upon the latent diffusion models, where the fixed latent generated by diffusion models is regarded as the content representation and we propose to decode arbitrary resolution images with a compact generated latent using a one-step generator. Thus, we present the InfGen, replacing the VAE decoder with the new generator, for generating images at any resolution from a fixed-size latent without retraining the diffusion models, which simplifies the process, reducing computational complexity and can be applied to any model using the same latent space. Experiments show InfGen is capable of improving many models into the arbitrary high-resolution era while cutting 4K image generation time to under 10 seconds.
PDF305September 15, 2025