RefGC-SR^2：參考引導的生成內容超解析度與精煉

摘要

參考引導生成（例如物體合成、個性化定製）已取得快速進展，然而現有流程存在一個根本限制：用戶提供的高解析度參考影像（HRRI）在輸入模型前會被降採樣為固定的低解析度（LR），導致精細細節在輸出生成前即被捨棄。此外，生成步驟會在此資訊損失的基礎上引入自身偽影（例如身份失真）。現有的參考引導生成內容精煉（RefGCR）方法雖能修正部分偽影，但仍侷限於低解析度範疇；參考引導超解析度（RefSR）方法雖可恢復解析度，但其假設為自然影像退化模式，忽略了生成式管線的偽影分佈特性。為在單一架構中同時解決這兩項缺點，我們提出一新任務：參考引導生成內容超解析度-精煉（RefGC-SR^2），其核心是在後處理階段重新利用原始高解析度參考影像，同時恢復遺失細節、修正生成偽影並提升輸出解析度。我們為此RefGC-SR^2任務建構了首個真實世界三元組資料生成管線，訓練雙聯影像條件生成器來合成成對的低品質錨點影像，以補足現有公開預訓練模型無法提供的資料。此外，我們提出一款頻率感知擴散變壓器模型，可選擇性地注入高解析度參考影像中的精細細節，同時移除生成偽影。大量實驗證明，我們的RefGC-SR^2模型能成功達成：（一）根據參考影像忠實精煉物體身份，以及（二）恢復高解析度細節，使最終結果的品質與實用性明顯優於現有的RefGCR與RefSR基準方法。

English

Reference-guided generation (e.g., object compositing, customization) has progressed rapidly, yet current pipelines share a fundamental limitation: the object-centric high-resolution reference image (HRRI) provided by users is downsampled to a fixed low-resolution (LR) before being fed into the model, so the fine-grained details are discarded before the output is even produced. In addition, the generation step then introduces its own artifacts (e.g., identity distortion) on top of this loss. Existing reference-guided generated content refinement (RefGCR) methods can correct some of these artifacts but still operate in the LR domain; reference-guided super-resolution (RefSR) methods recover resolution but assume natural-image degradations and ignore the artifact distribution of generative pipelines. To address both gaps in a single formulation, we introduce a new task: reference-guided generated content super-resolution-refinement (RefGC-SR^2), where the original HRRI is reused at the post-processing stage to recover lost details, refine generative artifacts, and upscale the output simultaneously. We construct the first real-world triplet data generation pipeline for this RefGC-SR^2 task, training a diptych-conditioned generator to synthesize paired low-quality anchors that public pretrained models cannot provide. We further present a frequency-aware diffusion transformer model for RefGC-SR^2 that selectively injects fine details from the HRRI while removing generative artifacts. Extensive experiments demonstrate that our RefGC-SR^2 model successfully (i) refines the object identity faithfully with respect to the reference, and (ii) recovers high-resolution details, so that the final result is significantly higher quality and practically more usable compared to existing RefGCR and RefSR baselines.