LucidFlux：基于大规模扩散变换器的无字幕通用图像修复

摘要

通用图像修复（UIR）旨在恢复因未知混合因素而退化的图像，同时保持语义完整性——在此条件下，判别式修复器和基于UNet的扩散先验往往会导致过度平滑、幻觉或漂移。我们提出了LucidFlux，一个无需图像描述的UIR框架，它适配了一个大型扩散变换器（Flux.1）而无需依赖图像描述。LucidFlux引入了一种轻量级的双分支条件器，该条件器从退化输入和轻度修复的代理中分别注入信号，以锚定几何结构并抑制伪影。随后，设计了一种时间步和层级自适应的调制调度策略，将这些线索在骨干网络层次结构中传递，从而生成从粗到细且上下文感知的更新，在恢复纹理的同时保护全局结构。此外，为了避免文本提示或多模态大语言模型描述带来的延迟和不稳定性，我们通过从代理中提取的SigLIP特征强制执行无描述语义对齐。一个可扩展的筛选管道进一步过滤大规模数据，以提供结构丰富的监督。在合成和真实场景的基准测试中，LucidFlux始终优于强大的开源和商业基线，消融研究验证了每个组件的必要性。LucidFlux表明，对于大型扩散变换器而言，何时、何地以及基于什么进行条件化——而非增加参数或依赖文本提示——是实现鲁棒且无需描述的通用图像修复的关键杠杆。

English

Universal image restoration (UIR) aims to recover images degraded by unknown mixtures while preserving semantics -- conditions under which discriminative restorers and UNet-based diffusion priors often oversmooth, hallucinate, or drift. We present LucidFlux, a caption-free UIR framework that adapts a large diffusion transformer (Flux.1) without image captions. LucidFlux introduces a lightweight dual-branch conditioner that injects signals from the degraded input and a lightly restored proxy to respectively anchor geometry and suppress artifacts. Then, a timestep- and layer-adaptive modulation schedule is designed to route these cues across the backbone's hierarchy, in order to yield coarse-to-fine and context-aware updates that protect the global structure while recovering texture. After that, to avoid the latency and instability of text prompts or MLLM captions, we enforce caption-free semantic alignment via SigLIP features extracted from the proxy. A scalable curation pipeline further filters large-scale data for structure-rich supervision. Across synthetic and in-the-wild benchmarks, LucidFlux consistently outperforms strong open-source and commercial baselines, and ablation studies verify the necessity of each component. LucidFlux shows that, for large DiTs, when, where, and what to condition on -- rather than adding parameters or relying on text prompts -- is the governing lever for robust and caption-free universal image restoration in the wild.

LucidFlux：基于大规模扩散变换器的无字幕通用图像修复

LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer

摘要

Support