ChatPaper.aiChatPaper

LucidFlux:基于大规模扩散变换器的无字幕通用图像修复

LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer

September 26, 2025
作者: Song Fei, Tian Ye, Lujia Wang, Lei Zhu
cs.AI

摘要

通用图像修复(UIR)旨在恢复因未知混合因素而退化的图像,同时保持语义完整性——在此条件下,判别式修复器和基于UNet的扩散先验往往会导致过度平滑、幻觉或漂移。我们提出了LucidFlux,一个无需图像描述的UIR框架,它适配了一个大型扩散变换器(Flux.1)而无需依赖图像描述。LucidFlux引入了一种轻量级的双分支条件器,该条件器从退化输入和轻度修复的代理中分别注入信号,以锚定几何结构并抑制伪影。随后,设计了一种时间步和层级自适应的调制调度策略,将这些线索在骨干网络层次结构中传递,从而生成从粗到细且上下文感知的更新,在恢复纹理的同时保护全局结构。此外,为了避免文本提示或多模态大语言模型描述带来的延迟和不稳定性,我们通过从代理中提取的SigLIP特征强制执行无描述语义对齐。一个可扩展的筛选管道进一步过滤大规模数据,以提供结构丰富的监督。在合成和真实场景的基准测试中,LucidFlux始终优于强大的开源和商业基线,消融研究验证了每个组件的必要性。LucidFlux表明,对于大型扩散变换器而言,何时、何地以及基于什么进行条件化——而非增加参数或依赖文本提示——是实现鲁棒且无需描述的通用图像修复的关键杠杆。
English
Universal image restoration (UIR) aims to recover images degraded by unknown mixtures while preserving semantics -- conditions under which discriminative restorers and UNet-based diffusion priors often oversmooth, hallucinate, or drift. We present LucidFlux, a caption-free UIR framework that adapts a large diffusion transformer (Flux.1) without image captions. LucidFlux introduces a lightweight dual-branch conditioner that injects signals from the degraded input and a lightly restored proxy to respectively anchor geometry and suppress artifacts. Then, a timestep- and layer-adaptive modulation schedule is designed to route these cues across the backbone's hierarchy, in order to yield coarse-to-fine and context-aware updates that protect the global structure while recovering texture. After that, to avoid the latency and instability of text prompts or MLLM captions, we enforce caption-free semantic alignment via SigLIP features extracted from the proxy. A scalable curation pipeline further filters large-scale data for structure-rich supervision. Across synthetic and in-the-wild benchmarks, LucidFlux consistently outperforms strong open-source and commercial baselines, and ablation studies verify the necessity of each component. LucidFlux shows that, for large DiTs, when, where, and what to condition on -- rather than adding parameters or relying on text prompts -- is the governing lever for robust and caption-free universal image restoration in the wild.
PDF173September 29, 2025