ChatPaper.aiChatPaper

GGT-100K:生成式真实基准用于可泛化的现实世界图像恢复

GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration

May 29, 2026
作者: Xiangtao Kong, Jixin Zhao, Lingchen Sun, Rongyuan Wu, Lei Zhang
cs.AI

摘要

真实世界图像恢复(IR)面临高质量配对训练数据稀缺的瓶颈。合成数据集虽丰富但难以模拟真实退化,而真实世界配对数据集成本高昂且难以采集。因此,基于这些数据集训练的IR模型在真实场景中泛化能力有限。本文提出生成式真值(GGT)方法,利用生成式多模态基础模型(MFMs)从真实低质量(LQ)图像生成高质量(HQ)目标。首先,我们对九种最先进的MFMs(包括Nano-Banana-2和GPT-Image-2)在不同场景和退化类型的图像上进行了系统性评估。结果表明,采用基于视觉语言模型(VLM)自适应提示的Nano-Banana-2在合成感知真实且内容保真的HQ目标方面能力最强,可作为LQ输入的GGT。随后,我们利用Nano-Banana-2构建了GGT合成流程,通过多阶段质量控制确保数据可靠性,并构建了GGT-100K数据集——包含103,707对训练样本、覆盖多样场景与复杂真实退化的LQ-HQ配对数据集,同时建立了包含500对图像的测试集。大量实验表明,GGT-100K能持续提升多种IR模型在真实世界中的泛化能力,尤其对生成式IR模型的微调效果显著。我们的结果表明,MFMs可作为面向恢复的数据生成实用工具,而GGT-100K是拓展真实世界IR模型泛化边界的有效资源。
English
Real-world image restoration (IR) is bottlenecked by the scarcity of high-quality paired training data. Synthetic datasets are abundant but often fail to model real-world degradations, while real-world paired datasets are expensive and difficult to capture. As a result, IR models trained on these datasets show limited generalization in real-world scenarios. In this work, we propose Generative Ground Truth (GGT) by using generative multimodal foundation models (MFMs) to produce high-quality (HQ) targets from real-world low-quality (LQ) images. We first conduct a systematic evaluation of nine state-of-the-art MFMs, including Nano-Banana-2 and GPT-Image-2, on images of various scenes and degradation types. The results demonstrate that Nano-Banana-2 with VLM-based adaptive prompting shows the highest capability to synthesize perceptually realistic and content-faithful HQ targets, which can serve as the GGT for the LQ input. We then employ Nano-Banana-2 to build a GGT synthesis pipeline, which involves multi-stage quality control to ensure data reliability, and construct GGT-100K, an LQ-HQ paired dataset comprising 103,707 training pairs and covering diverse scenes and complex real-world degradations. A test set of 500 image pairs is also established. Extensive experiments show that GGT-100K consistently improves the real-world generalization of a wide range of IR models, with particularly strong benefits for finetuning generative models for IR tasks. Our results suggest that MFMs can serve as practical tools for restoration-oriented data generation, and GGT-100K is a useful resource to expand the generalization boundaries of real-world IR models.