統一的かつデータ効率的な画像間変換のための分離型残差拡散モデル

要旨

本稿では、統一型かつデータ効率的な画像間変換（I2I変換）のための分離残差ノイズ除去拡散モデル（Decoupled Residual Denoising Diffusion models, DRDD）を提案する。拡散モデルは品質と多様性の面でI2I変換を進展させてきたが、我々は拡散モデルにおいてこれまで十分に探求されていなかった特性を明らかにする。重要な点として、ガウスノイズの注入は、従来の多様体リフティング（すなわち、データを低次元多様体から引き離す役割）に加えて、ドメイン間の特徴分布を暗黙的に整合させることによりドメイン調和を促進する。この特性は、特に統一型I2I変換において有利に働く。しかし、既存の拡散モデルでは、ノイズと残差が単一の結合拡散過程で同時に除去されるため、この調和効果が早期に損なわれる。この問題に対処するため、DRDDは拡散過程を二つの逐次的かつ独立した拡散段階に分離する。（1）ドメイン調和と多様体リフティングのための確率的ノイズ拡散、および（2）固定ノイズ領域内で完全に中核的な意味マッピングを学習する決定論的残差拡散である。この分離により、変換全体にわたって調和効果と多様体リフティング効果が維持され、多様なタスクやドメインにわたる統一マッピングの学習が大幅に簡略化される。特に、ノイズ拡散段階は豊富な非ペアの対象ドメイン画像のみで学習されるため、データ効率が大幅に向上する。包括的な理論的・実証的解析により、DRDDは主流の拡散モデルと広く互換性があり、限られたペアデータの下でも堅牢で統一的なI2I変換を一貫して提供することが示される。我々のコードはhttps://github.com/HKU-HealthAI/DRDDで公開されている。

English

We propose Decoupled Residual Denoising Diffusion models (DRDD) for unified and data-efficient image-to-image (I2I) translation. While diffusion models have advanced I2I translation in terms of quality and diversity, we uncover a previously under-explored property in diffusion models. Crucially, beyond its conventional role of manifold lifting (i.e., moving data off low-dimensional manifolds), injecting Gaussian noise facilitates domain harmonization by implicitly aligning feature distributions across domains, a property particularly advantageous for unified I2I translation. However, existing diffusion models prematurely erode this harmonization effect, as noise and residuals are simultaneously removed in a single coupled diffusion process. To address this, DRDD decouples the diffusion process into two sequential and independent diffusion stages: (1) a stochastic noise diffusion for domain harmonization and manifold lifting, and (2) a deterministic residual diffusion that learns the core semantic mapping entirely within the fixed-noise domain. This decoupling preserves harmonization and manifold lifting effects throughout the transformation, substantially simplifying the learning of unified mappings across diverse tasks and domains. Notably, the noise diffusion stage is trained exclusively on abundant, unpaired target-domain images, greatly improving data efficiency. Comprehensive theoretical and empirical analysis demonstrates that DRDD is broadly compatible with mainstream diffusion models and consistently delivers robust, unified I2I translation, even under limited paired data. Our code is available at https://github.com/HKU-HealthAI/DRDD.