OmniRefiner：基于强化学习引导的局部扩散优化模型

摘要

尽管参考引导的图像生成技术发展迅速，但现有扩散模型在使用参考图像优化生成结果时，仍难以保持细粒度的视觉细节。这一局限源于基于VAE的潜在空间压缩机制会固有地丢失细微纹理信息，导致身份特征与属性相关的视觉线索消失。此外，基于现有方法的局部细节增强后处理方案，常会在光照、纹理或形状方面产生与原始图像不一致的结果。为此，我们提出了细节感知优化框架，通过连续两阶段的参考驱动修正来提升像素级一致性。我们首先对单图像扩散编辑器进行适配微调，使其能同时处理生成草稿与参考图像，在保持结构保真度的同时实现全局协调优化。随后应用强化学习进一步强化局部编辑能力，显式优化细节精度与语义一致性。大量实验表明，本方法在参考对齐度和细粒度细节保留方面显著提升，在极具挑战性的参考引导修复基准测试中，其生成的忠实且视觉连贯的编辑效果超越了开源与商业模型。

English

Reference-guided image generation has progressed rapidly, yet current diffusion models still struggle to preserve fine-grained visual details when refining a generated image using a reference. This limitation arises because VAE-based latent compression inherently discards subtle texture information, causing identity- and attribute-specific cues to vanish. Moreover, post-editing approaches that amplify local details based on existing methods often produce results inconsistent with the original image in terms of lighting, texture, or shape. To address this, we introduce , a detail-aware refinement framework that performs two consecutive stages of reference-driven correction to enhance pixel-level consistency. We first adapt a single-image diffusion editor by fine-tuning it to jointly ingest the draft image and the reference image, enabling globally coherent refinement while maintaining structural fidelity. We then apply reinforcement learning to further strengthen localized editing capability, explicitly optimizing for detail accuracy and semantic consistency. Extensive experiments demonstrate that significantly improves reference alignment and fine-grained detail preservation, producing faithful and visually coherent edits that surpass both open-source and commercial models on challenging reference-guided restoration benchmarks.

OmniRefiner：基于强化学习引导的局部扩散优化模型

OmniRefiner: Reinforcement-Guided Local Diffusion Refinement

摘要

Support