LucidFlux: Universele Beeldrestauratie zonder Bijschriften via een Grootschalige Diffusie Transformer

Samenvatting

Universele beeldrestauratie (UIR) heeft als doel afbeeldingen te herstellen die zijn aangetast door onbekende mengsels, terwijl de semantiek behouden blijft — omstandigheden waarbij discriminerende restauratiemethoden en UNet-gebaseerde diffusieprior vaak overmatig gladstrijken, hallucineren of afdrijven. Wij presenteren LucidFlux, een caption-vrij UIR-raamwerk dat een grote diffusietransformer (Flux.1) aanpast zonder beeldcaptions. LucidFlux introduceert een lichtgewicht dual-branch conditioner die signalen injecteert van het aangetaste invoerbeeld en een licht herstelde proxy om respectievelijk de geometrie te verankeren en artefacten te onderdrukken. Vervolgens wordt een tijdsstap- en laagadaptief modulatieschema ontworpen om deze signalen door de hiërarchie van de backbone te routeren, om zo grof-naar-fijn en contextbewuste updates te genereren die de globale structuur beschermen terwijl textuur wordt hersteld. Daarna, om de latentie en instabiliteit van tekstprompts of MLLM-captions te vermijden, wordt caption-vrije semantische uitlijning afgedwongen via SigLIP-features die uit de proxy worden geëxtraheerd. Een schaalbare curatiepipeline filtert verder grootschalige data voor structuurrijke supervisie. Over synthetische en in-the-wild benchmarks heen presteert LucidFlux consistent beter dan sterke open-source en commerciële baselines, en ablatiestudies verifiëren de noodzaak van elke component. LucidFlux toont aan dat, voor grote DiTs, wanneer, waar en waarop te conditioneren — in plaats van parameters toe te voegen of te vertrouwen op tekstprompts — de bepalende factor is voor robuuste en caption-vrije universele beeldrestauratie in het wild.

English

Universal image restoration (UIR) aims to recover images degraded by unknown mixtures while preserving semantics -- conditions under which discriminative restorers and UNet-based diffusion priors often oversmooth, hallucinate, or drift. We present LucidFlux, a caption-free UIR framework that adapts a large diffusion transformer (Flux.1) without image captions. LucidFlux introduces a lightweight dual-branch conditioner that injects signals from the degraded input and a lightly restored proxy to respectively anchor geometry and suppress artifacts. Then, a timestep- and layer-adaptive modulation schedule is designed to route these cues across the backbone's hierarchy, in order to yield coarse-to-fine and context-aware updates that protect the global structure while recovering texture. After that, to avoid the latency and instability of text prompts or MLLM captions, we enforce caption-free semantic alignment via SigLIP features extracted from the proxy. A scalable curation pipeline further filters large-scale data for structure-rich supervision. Across synthetic and in-the-wild benchmarks, LucidFlux consistently outperforms strong open-source and commercial baselines, and ablation studies verify the necessity of each component. LucidFlux shows that, for large DiTs, when, where, and what to condition on -- rather than adding parameters or relying on text prompts -- is the governing lever for robust and caption-free universal image restoration in the wild.

LucidFlux: Universele Beeldrestauratie zonder Bijschriften via een Grootschalige Diffusie Transformer

LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer

Samenvatting

Support