ChatPaper.aiChatPaper

MegaFusion:将扩展扩散模型以实现更高分辨率图像生成,无需进一步调整

MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning

August 20, 2024
作者: Haoning Wu, Shaocheng Shen, Qiang Hu, Xiaoyun Zhang, Ya Zhang, Yanfeng Wang
cs.AI

摘要

扩散模型因其出色的能力而成为文本到图像生成的领先者。然而,在训练过程中固定的图像分辨率通常会导致高分辨率图像生成方面的挑战,如语义不准确和对象复制等问题。本文介绍了MegaFusion,这是一种新颖方法,它将现有基于扩散的文本到图像生成模型扩展到高效的更高分辨率生成,无需额外的微调或适应。具体而言,我们采用一种创新的截断和中继策略来跨越不同分辨率之间的去噪过程,从而实现以粗到精的方式进行高分辨率图像生成。此外,通过整合扩张卷积和噪声重新调度,我们进一步调整模型的先验以适应更高的分辨率。MegaFusion的多功能性和有效性使其普遍适用于潜在空间和像素空间的扩散模型,以及其他衍生模型。大量实验证实,MegaFusion显著提升了现有模型生成百万像素和各种长宽比图像的能力,而仅需约原始计算成本的40%。
English
Diffusion models have emerged as frontrunners in text-to-image generation for their impressive capabilities. Nonetheless, their fixed image resolution during training often leads to challenges in high-resolution image generation, such as semantic inaccuracies and object replication. This paper introduces MegaFusion, a novel approach that extends existing diffusion-based text-to-image generation models towards efficient higher-resolution generation without additional fine-tuning or extra adaptation. Specifically, we employ an innovative truncate and relay strategy to bridge the denoising processes across different resolutions, allowing for high-resolution image generation in a coarse-to-fine manner. Moreover, by integrating dilated convolutions and noise re-scheduling, we further adapt the model's priors for higher resolution. The versatility and efficacy of MegaFusion make it universally applicable to both latent-space and pixel-space diffusion models, along with other derivative models. Extensive experiments confirm that MegaFusion significantly boosts the capability of existing models to produce images of megapixels and various aspect ratios, while only requiring about 40% of the original computational cost.

Summary

AI-Generated Summary

PDF122November 17, 2024