GGT-100K:面向可泛化真實世界影像復原的生成式真實數據
GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration
May 29, 2026
作者: Xiangtao Kong, Jixin Zhao, Lingchen Sun, Rongyuan Wu, Lei Zhang
cs.AI
摘要
真實世界影像修復(IR)的瓶頸在於高品質成對訓練資料的稀缺。合成資料集雖數量龐大,但常無法模擬真實世界的退化現象;而真實世界的成對資料集既昂貴又難以取得。因此,基於這些資料集訓練的IR模型在真實場景中的泛化能力有限。本研究提出生成式基準真相(Generative Ground Truth, GGT),利用生成式多模態基礎模型(multimodal foundation models, MFMs)從真實世界低品質(Low-Quality, LQ)影像中產生高品質(High-Quality, HQ)目標。我們首先在各種場景與退化類型的影像上,對九種最先進的MFM(包括Nano-Banana-2與GPT-Image-2)進行系統性評估。結果顯示,採用基於視覺語言模型(VLM)的自適應提示(adaptive prompting)之Nano-Banana-2,最能合成感知上真實且內容忠於原文的HQ目標,可作為LQ輸入的GGT。接著我們利用Nano-Banana-2構建一套GGT合成流程,其中包含多階段品質控制以確保資料可靠性,並建構GGT-100K — 一個包含103,707組訓練對的LQ-HQ成對資料集,涵蓋多樣場景與複雜的真實世界退化。另建立500組影像對的測試集。大量實驗顯示,GGT-100K能持續提升多種IR模型在真實世界的泛化能力,尤其對微調生成式模型以進行IR任務有顯著助益。本研究結果表明,MFM可作為修復導向資料生成的實用工具,而GGT-100K則是有助於拓展真實世界IR模型泛化邊界的實用資源。
English
Real-world image restoration (IR) is bottlenecked by the scarcity of high-quality paired training data. Synthetic datasets are abundant but often fail to model real-world degradations, while real-world paired datasets are expensive and difficult to capture. As a result, IR models trained on these datasets show limited generalization in real-world scenarios. In this work, we propose Generative Ground Truth (GGT) by using generative multimodal foundation models (MFMs) to produce high-quality (HQ) targets from real-world low-quality (LQ) images. We first conduct a systematic evaluation of nine state-of-the-art MFMs, including Nano-Banana-2 and GPT-Image-2, on images of various scenes and degradation types. The results demonstrate that Nano-Banana-2 with VLM-based adaptive prompting shows the highest capability to synthesize perceptually realistic and content-faithful HQ targets, which can serve as the GGT for the LQ input. We then employ Nano-Banana-2 to build a GGT synthesis pipeline, which involves multi-stage quality control to ensure data reliability, and construct GGT-100K, an LQ-HQ paired dataset comprising 103,707 training pairs and covering diverse scenes and complex real-world degradations. A test set of 500 image pairs is also established. Extensive experiments show that GGT-100K consistently improves the real-world generalization of a wide range of IR models, with particularly strong benefits for finetuning generative models for IR tasks. Our results suggest that MFMs can serve as practical tools for restoration-oriented data generation, and GGT-100K is a useful resource to expand the generalization boundaries of real-world IR models.