통합적이고 데이터 효율적인 이미지-이미지 변환을 위한 분리 잔차 잡음 제거 확산 모델

초록

우리는 통합적이고 데이터 효율적인 이미지-이미지(I2I) 변환을 위해 분리된 잔차 잡음 제거 확산 모델(DRDD)을 제안한다. 확산 모델이 품질과 다양성 측면에서 I2I 변환을 발전시켰지만, 우리는 확산 모델에서 이전에 충분히 탐구되지 않은 특성을 발견했다. 중요한 점은, 가우시안 잡음 주입이 기존의 다양체 리프팅(즉, 데이터를 저차원 다양체에서 벗어나게 하는 것) 역할을 넘어, 도메인 간 특징 분포를 암묵적으로 정렬함으로써 도메인 조화를 촉진한다는 것이다. 이 특성은 특히 통합 I2I 변환에 유리하다. 그러나 기존 확산 모델은 잡음과 잔차가 단일 결합 확산 과정에서 동시에 제거되기 때문에 이러한 조화 효과를 조기에 상실시킨다. 이 문제를 해결하기 위해, DRDD는 확산 과정을 두 개의 순차적이고 독립적인 확산 단계로 분리한다: (1) 도메인 조화 및 다양체 리프팅을 위한 확률적 잡음 확산, (2) 고정된 잡음 도메인 내에서 핵심 의미 매핑을 완전히 학습하는 결정적 잡음 확산. 이러한 분리는 변환 전반에 걸쳐 조화 및 다양체 리프팅 효과를 유지하여, 다양한 작업과 도메인에 걸친 통합 매핑 학습을 크게 단순화한다. 특히, 잡음 확산 단계는 풍부하고 짝이 없는 대상 도메인 이미지에 대해서만 학습되므로 데이터 효율성이 크게 향상된다. 포괄적인 이론 및 실험 분석은 DRDD가 주류 확산 모델과 광범위하게 호환되며, 제한된 짝지어진 데이터에서도 강력하고 통합된 I2I 변환을 일관되게 제공함을 보여준다. 코드는 https://github.com/HKU-HealthAI/DRDD에서 확인할 수 있다.

English

We propose Decoupled Residual Denoising Diffusion models (DRDD) for unified and data-efficient image-to-image (I2I) translation. While diffusion models have advanced I2I translation in terms of quality and diversity, we uncover a previously under-explored property in diffusion models. Crucially, beyond its conventional role of manifold lifting (i.e., moving data off low-dimensional manifolds), injecting Gaussian noise facilitates domain harmonization by implicitly aligning feature distributions across domains, a property particularly advantageous for unified I2I translation. However, existing diffusion models prematurely erode this harmonization effect, as noise and residuals are simultaneously removed in a single coupled diffusion process. To address this, DRDD decouples the diffusion process into two sequential and independent diffusion stages: (1) a stochastic noise diffusion for domain harmonization and manifold lifting, and (2) a deterministic residual diffusion that learns the core semantic mapping entirely within the fixed-noise domain. This decoupling preserves harmonization and manifold lifting effects throughout the transformation, substantially simplifying the learning of unified mappings across diverse tasks and domains. Notably, the noise diffusion stage is trained exclusively on abundant, unpaired target-domain images, greatly improving data efficiency. Comprehensive theoretical and empirical analysis demonstrates that DRDD is broadly compatible with mainstream diffusion models and consistently delivers robust, unified I2I translation, even under limited paired data. Our code is available at https://github.com/HKU-HealthAI/DRDD.