RefGC-SR^2：参考引导的生成内容超分辨率与精炼

摘要

参考引导生成（如对象合成、定制化）已取得显著进展，但当前流程存在一个根本性局限：用户提供面向对象的高分辨率参考图像（HRRI）在输入模型前会被下采样至固定低分辨率（LR），导致精细细节在输出生成前即被丢弃。此外，生成步骤还会在此基础上引入自身伪影（如身份畸变）。现有参考引导生成内容精炼（RefGCR）方法虽可修正部分伪影，但仍局限于低分辨率域；参考引导超分辨率（RefSR）方法虽能恢复分辨率，却假设了自然图像退化模型，忽视了生成管线的伪影分布特性。为同时解决这两项不足，我们提出新任务：参考引导生成内容超分辨率-精炼（RefGC-SR²），其核心思路是在后处理阶段复用原始高分辨率参考图像，同步恢复丢失细节、修正生成伪影并提升输出分辨率。我们为RefGC-SR²任务构建了首个真实三元组数据生成管线，训练基于双联图像条件的生成器以合成预训练模型无法提供的配对低质量锚点。进一步地，我们提出面向RefGC-SR²的频率感知扩散变换模型，能选择性注入高分辨率参考图像的精细细节并消除生成伪影。大量实验表明，我们的RefGC-SR²模型能够（i）忠实地根据参考图像精炼对象身份，并（ii）恢复高分辨率细节，最终结果相比现有RefGCR和RefSR基线方法在质量上显著提升，实际可用性更强。

English

Reference-guided generation (e.g., object compositing, customization) has progressed rapidly, yet current pipelines share a fundamental limitation: the object-centric high-resolution reference image (HRRI) provided by users is downsampled to a fixed low-resolution (LR) before being fed into the model, so the fine-grained details are discarded before the output is even produced. In addition, the generation step then introduces its own artifacts (e.g., identity distortion) on top of this loss. Existing reference-guided generated content refinement (RefGCR) methods can correct some of these artifacts but still operate in the LR domain; reference-guided super-resolution (RefSR) methods recover resolution but assume natural-image degradations and ignore the artifact distribution of generative pipelines. To address both gaps in a single formulation, we introduce a new task: reference-guided generated content super-resolution-refinement (RefGC-SR^2), where the original HRRI is reused at the post-processing stage to recover lost details, refine generative artifacts, and upscale the output simultaneously. We construct the first real-world triplet data generation pipeline for this RefGC-SR^2 task, training a diptych-conditioned generator to synthesize paired low-quality anchors that public pretrained models cannot provide. We further present a frequency-aware diffusion transformer model for RefGC-SR^2 that selectively injects fine details from the HRRI while removing generative artifacts. Extensive experiments demonstrate that our RefGC-SR^2 model successfully (i) refines the object identity faithfully with respect to the reference, and (ii) recovers high-resolution details, so that the final result is significantly higher quality and practically more usable compared to existing RefGCR and RefSR baselines.