LucidFlux：基於大規模擴散變換器的無標註通用圖像修復

摘要

通用圖像修復（UIR）旨在恢復因未知混合因素而退化的圖像，同時保持語義——在這些條件下，判別式修復器和基於UNet的擴散先驗往往會過度平滑、產生幻覺或偏離。我們提出了LucidFlux，這是一個無需圖說（caption-free）的UIR框架，它適應了一個大型擴散變換器（Flux.1），且不依賴圖像圖說。LucidFlux引入了一個輕量級的雙分支條件器，該條件器從退化的輸入和輕度恢復的代理中注入信號，分別錨定幾何結構並抑制偽影。接著，設計了一個基於時間步和層次的自適應調製計劃，以在骨幹網絡的層次結構中傳遞這些線索，從而產生從粗到細且上下文感知的更新，在恢復紋理的同時保護全局結構。此外，為了避免文本提示或多模態大語言模型（MLLM）圖說的延遲和不穩定性，我們通過從代理中提取的SigLIP特徵來強制實現無需圖說的語義對齊。一個可擴展的數據篩選管道進一步過濾大規模數據，以獲得結構豐富的監督。在合成和真實場景的基準測試中，LucidFlux始終優於強大的開源和商業基線，消融研究驗證了每個組件的必要性。LucidFlux表明，對於大型擴散變換器（DiTs）而言，何時、何地以及對什麼進行條件化——而不是增加參數或依賴文本提示——是在真實場景中實現魯棒且無需圖說的通用圖像修復的關鍵槓桿。

English

Universal image restoration (UIR) aims to recover images degraded by unknown mixtures while preserving semantics -- conditions under which discriminative restorers and UNet-based diffusion priors often oversmooth, hallucinate, or drift. We present LucidFlux, a caption-free UIR framework that adapts a large diffusion transformer (Flux.1) without image captions. LucidFlux introduces a lightweight dual-branch conditioner that injects signals from the degraded input and a lightly restored proxy to respectively anchor geometry and suppress artifacts. Then, a timestep- and layer-adaptive modulation schedule is designed to route these cues across the backbone's hierarchy, in order to yield coarse-to-fine and context-aware updates that protect the global structure while recovering texture. After that, to avoid the latency and instability of text prompts or MLLM captions, we enforce caption-free semantic alignment via SigLIP features extracted from the proxy. A scalable curation pipeline further filters large-scale data for structure-rich supervision. Across synthetic and in-the-wild benchmarks, LucidFlux consistently outperforms strong open-source and commercial baselines, and ablation studies verify the necessity of each component. LucidFlux shows that, for large DiTs, when, where, and what to condition on -- rather than adding parameters or relying on text prompts -- is the governing lever for robust and caption-free universal image restoration in the wild.