ChatPaper.aiChatPaper

RefGC-SR^2:參考引導的生成內容超解析度與精煉

RefGC-SR^2: Reference-guided Generated Content Super-Resolution and Refinement

June 13, 2026
作者: Jeahun Sung, Dahyeon Kye, Soo Ye Kim, Jihyong Oh
cs.AI

摘要

參考引導生成(例如物體合成、個性化定製)已取得快速進展,然而現有流程存在一個根本限制:用戶提供的高解析度參考影像(HRRI)在輸入模型前會被降採樣為固定的低解析度(LR),導致精細細節在輸出生成前即被捨棄。此外,生成步驟會在此資訊損失的基礎上引入自身偽影(例如身份失真)。現有的參考引導生成內容精煉(RefGCR)方法雖能修正部分偽影,但仍侷限於低解析度範疇;參考引導超解析度(RefSR)方法雖可恢復解析度,但其假設為自然影像退化模式,忽略了生成式管線的偽影分佈特性。為在單一架構中同時解決這兩項缺點,我們提出一新任務:參考引導生成內容超解析度-精煉(RefGC-SR^2),其核心是在後處理階段重新利用原始高解析度參考影像,同時恢復遺失細節、修正生成偽影並提升輸出解析度。我們為此RefGC-SR^2任務建構了首個真實世界三元組資料生成管線,訓練雙聯影像條件生成器來合成成對的低品質錨點影像,以補足現有公開預訓練模型無法提供的資料。此外,我們提出一款頻率感知擴散變壓器模型,可選擇性地注入高解析度參考影像中的精細細節,同時移除生成偽影。大量實驗證明,我們的RefGC-SR^2模型能成功達成:(一)根據參考影像忠實精煉物體身份,以及(二)恢復高解析度細節,使最終結果的品質與實用性明顯優於現有的RefGCR與RefSR基準方法。
English
Reference-guided generation (e.g., object compositing, customization) has progressed rapidly, yet current pipelines share a fundamental limitation: the object-centric high-resolution reference image (HRRI) provided by users is downsampled to a fixed low-resolution (LR) before being fed into the model, so the fine-grained details are discarded before the output is even produced. In addition, the generation step then introduces its own artifacts (e.g., identity distortion) on top of this loss. Existing reference-guided generated content refinement (RefGCR) methods can correct some of these artifacts but still operate in the LR domain; reference-guided super-resolution (RefSR) methods recover resolution but assume natural-image degradations and ignore the artifact distribution of generative pipelines. To address both gaps in a single formulation, we introduce a new task: reference-guided generated content super-resolution-refinement (RefGC-SR^2), where the original HRRI is reused at the post-processing stage to recover lost details, refine generative artifacts, and upscale the output simultaneously. We construct the first real-world triplet data generation pipeline for this RefGC-SR^2 task, training a diptych-conditioned generator to synthesize paired low-quality anchors that public pretrained models cannot provide. We further present a frequency-aware diffusion transformer model for RefGC-SR^2 that selectively injects fine details from the HRRI while removing generative artifacts. Extensive experiments demonstrate that our RefGC-SR^2 model successfully (i) refines the object identity faithfully with respect to the reference, and (ii) recovers high-resolution details, so that the final result is significantly higher quality and practically more usable compared to existing RefGCR and RefSR baselines.