ChatPaper.aiChatPaper

Diffusion-4K:基于潜在扩散模型的超高清图像合成

Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models

March 24, 2025
作者: Jinjin Zhang, Qiuyu Huang, Junjie Liu, Xiefan Guo, Di Huang
cs.AI

摘要

本文提出了Diffusion-4K,一种利用文本到图像扩散模型直接合成超高清图像的新框架。其核心创新包括:(1)Aesthetic-4K基准:针对公开可用的4K图像合成数据集的缺失,我们构建了Aesthetic-4K,一个全面的超高清图像生成基准。我们精心挑选了高质量4K图像,并配以GPT-4o生成的描述,同时引入GLCM评分和压缩比指标来评估细节表现,结合FID、美学评分和CLIPScore等整体指标,实现对超高清图像的全面评估。(2)基于小波的微调:我们提出了一种基于小波的微调方法,可直接用于真实感4K图像的训练,适用于多种潜在扩散模型,展示了其在合成高细节4K图像方面的有效性。因此,Diffusion-4K在高质量图像合成和文本提示遵循方面表现出色,特别是在现代大规模扩散模型(如SD3-2B和Flux-12B)的支持下。我们基准测试的大量实验结果证明了Diffusion-4K在超高清图像合成领域的优越性。
English
In this paper, we present Diffusion-4K, a novel framework for direct ultra-high-resolution image synthesis using text-to-image diffusion models. The core advancements include: (1) Aesthetic-4K Benchmark: addressing the absence of a publicly available 4K image synthesis dataset, we construct Aesthetic-4K, a comprehensive benchmark for ultra-high-resolution image generation. We curated a high-quality 4K dataset with carefully selected images and captions generated by GPT-4o. Additionally, we introduce GLCM Score and Compression Ratio metrics to evaluate fine details, combined with holistic measures such as FID, Aesthetics and CLIPScore for a comprehensive assessment of ultra-high-resolution images. (2) Wavelet-based Fine-tuning: we propose a wavelet-based fine-tuning approach for direct training with photorealistic 4K images, applicable to various latent diffusion models, demonstrating its effectiveness in synthesizing highly detailed 4K images. Consequently, Diffusion-4K achieves impressive performance in high-quality image synthesis and text prompt adherence, especially when powered by modern large-scale diffusion models (e.g., SD3-2B and Flux-12B). Extensive experimental results from our benchmark demonstrate the superiority of Diffusion-4K in ultra-high-resolution image synthesis.

Summary

AI-Generated Summary

PDF62March 25, 2025