Ontkoppelde Residuele Denoising Diffusiemodellen voor Geünificeerde en Data-efficiënte Beeld-naar-beeld Vertaling

Samenvatting

Wij stellen Ontkoppelde Residuele Denoising Diffusiemodellen (DRDD) voor voor uniforme en data-efficiënte beeld-naar-beeld (I2I)-translatie. Hoewel diffusiemodellen de kwaliteit en diversiteit van I2I-translatie hebben verbeterd, ontdekken wij een voorheen onderbelichte eigenschap van diffusiemodellen. Cruciaal is dat het injecteren van Gaussische ruis, naast zijn conventionele rol van manifold lifting (het optillen van data van laagdimensionale manifolds), domeinharmonisatie bevordert door kenmerkdistributies impliciet over domeinen heen uit te lijnen – een eigenschap die bijzonder gunstig is voor uniforme I2I-translatie. Bestaande diffusiemodellen tasten dit harmonisatie-effect echter voortijdig aan, omdat ruis en residuen gelijktijdig worden verwijderd in één enkel gekoppeld diffusieproces. Om dit aan te pakken, ontkoppelt DRDD het diffusieproces in twee opeenvolgende en onafhankelijke diffusiefasen: (1) een stochastische ruisdiffusie voor domeinharmonisatie en manifold lifting, en (2) een deterministische residu-diffusie die de kern-semantische mapping volledig binnen het vaste-ruisdomein leert. Deze ontkoppeling behoudt de harmonisatie- en manifold-liftingeffecten gedurende de gehele transformatie, wat het leren van uniforme mappings over diverse taken en domeinen aanzienlijk vereenvoudigt. Opmerkelijk is dat de ruisdiffusiefase uitsluitend wordt getraind op overvloedige, ongepaarde doelgebiedafbeeldingen, wat de data-efficiëntie sterk verbetert. Uitgebreide theoretische en empirische analyse toont aan dat DRDD breed compatibel is met gangbare diffusiemodellen en consistent robuuste, uniforme I2I-translatie levert, zelfs onder beperkte gepaarde data. Onze code is beschikbaar op https://github.com/HKU-HealthAI/DRDD.

English

We propose Decoupled Residual Denoising Diffusion models (DRDD) for unified and data-efficient image-to-image (I2I) translation. While diffusion models have advanced I2I translation in terms of quality and diversity, we uncover a previously under-explored property in diffusion models. Crucially, beyond its conventional role of manifold lifting (i.e., moving data off low-dimensional manifolds), injecting Gaussian noise facilitates domain harmonization by implicitly aligning feature distributions across domains, a property particularly advantageous for unified I2I translation. However, existing diffusion models prematurely erode this harmonization effect, as noise and residuals are simultaneously removed in a single coupled diffusion process. To address this, DRDD decouples the diffusion process into two sequential and independent diffusion stages: (1) a stochastic noise diffusion for domain harmonization and manifold lifting, and (2) a deterministic residual diffusion that learns the core semantic mapping entirely within the fixed-noise domain. This decoupling preserves harmonization and manifold lifting effects throughout the transformation, substantially simplifying the learning of unified mappings across diverse tasks and domains. Notably, the noise diffusion stage is trained exclusively on abundant, unpaired target-domain images, greatly improving data efficiency. Comprehensive theoretical and empirical analysis demonstrates that DRDD is broadly compatible with mainstream diffusion models and consistently delivers robust, unified I2I translation, even under limited paired data. Our code is available at https://github.com/HKU-HealthAI/DRDD.