ChatPaper.aiChatPaper

UltraHR-100K:基于大规模高质量数据集提升超高清图像合成效果

UltraHR-100K: Enhancing UHR Image Synthesis with A Large-Scale High-Quality Dataset

October 23, 2025
作者: Chen Zhao, En Ci, Yunzhe Xu, Tiehan Fan, Shanyan Guan, Yanhao Ge, Jian Yang, Ying Tai
cs.AI

摘要

超高分辨率文本到图像生成技术已取得显著进展,但仍面临两大挑战:一是缺乏大规模高质量的超高分辨率文本-图像数据集;二是现有方法未能针对超高分辨率场景下的细粒度细节合成设计专用训练策略。为解决首个挑战,我们构建了包含10万张高质量图像的UltraHR-100K数据集,该数据集兼具丰富语义标注与视觉保真度,每张图像分辨率均超过3K,并依据细节丰富度、内容复杂度与美学质量进行严格筛选。针对第二个挑战,我们提出频率感知的后训练优化方法,通过(i)细节导向时间步采样策略,使模型聚焦于细节关键的去噪阶段;(ii)基于离散傅里叶变换的软加权频率正则化技术,以柔性约束方式保持高频细节。在自建的UltraHR-eval4K基准测试上的大量实验表明,本方法显著提升了超高分辨率图像生成的细节质量与整体保真度。代码已开源于https://github.com/NJU-PCALab/UltraHR-100k。
English
Ultra-high-resolution (UHR) text-to-image (T2I) generation has seen notable progress. However, two key challenges remain : 1) the absence of a large-scale high-quality UHR T2I dataset, and (2) the neglect of tailored training strategies for fine-grained detail synthesis in UHR scenarios. To tackle the first challenge, we introduce UltraHR-100K, a high-quality dataset of 100K UHR images with rich captions, offering diverse content and strong visual fidelity. Each image exceeds 3K resolution and is rigorously curated based on detail richness, content complexity, and aesthetic quality. To tackle the second challenge, we propose a frequency-aware post-training method that enhances fine-detail generation in T2I diffusion models. Specifically, we design (i) Detail-Oriented Timestep Sampling (DOTS) to focus learning on detail-critical denoising steps, and (ii) Soft-Weighting Frequency Regularization (SWFR), which leverages Discrete Fourier Transform (DFT) to softly constrain frequency components, encouraging high-frequency detail preservation. Extensive experiments on our proposed UltraHR-eval4K benchmarks demonstrate that our approach significantly improves the fine-grained detail quality and overall fidelity of UHR image generation. The code is available at https://github.com/NJU-PCALab/UltraHR-100k{here}.
PDF131December 1, 2025