ChatPaper.aiChatPaper

MegaFusion:將擴散模型擴展至更高解析度影像生成,無需進一步調整

MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning

August 20, 2024
作者: Haoning Wu, Shaocheng Shen, Qiang Hu, Xiaoyun Zhang, Ya Zhang, Yanfeng Wang
cs.AI

摘要

擴散模型因其卓越的能力而成為文本到圖像生成的領跑者。然而,在訓練過程中固定的圖像解析度常常導致高解析度圖像生成方面的挑戰,例如語義不準確和物體重複。本文介紹了MegaFusion,這是一種新穎的方法,將現有基於擴散的文本到圖像生成模型擴展到高效的高解析度生成,而無需額外的微調或額外的適應。具體而言,我們採用了一種創新的截斷和中繼策略,以跨越不同解析度之間的去噪過程,實現了以粗到細的方式進行高解析度圖像生成。此外,通過整合膨脹卷積和噪聲重新安排,我們進一步調整了模型的先驗以適應更高的解析度。MegaFusion的多功能性和功效使其適用於潛在空間和像素空間的擴散模型,以及其他衍生模型。廣泛的實驗證實,MegaFusion顯著提升了現有模型生成百萬像素和各種長寬比圖像的能力,同時僅需原始計算成本的約40%。
English
Diffusion models have emerged as frontrunners in text-to-image generation for their impressive capabilities. Nonetheless, their fixed image resolution during training often leads to challenges in high-resolution image generation, such as semantic inaccuracies and object replication. This paper introduces MegaFusion, a novel approach that extends existing diffusion-based text-to-image generation models towards efficient higher-resolution generation without additional fine-tuning or extra adaptation. Specifically, we employ an innovative truncate and relay strategy to bridge the denoising processes across different resolutions, allowing for high-resolution image generation in a coarse-to-fine manner. Moreover, by integrating dilated convolutions and noise re-scheduling, we further adapt the model's priors for higher resolution. The versatility and efficacy of MegaFusion make it universally applicable to both latent-space and pixel-space diffusion models, along with other derivative models. Extensive experiments confirm that MegaFusion significantly boosts the capability of existing models to produce images of megapixels and various aspect ratios, while only requiring about 40% of the original computational cost.

Summary

AI-Generated Summary

PDF122November 17, 2024