用于统一且数据高效的图像到图像翻译的解耦残差去噪扩散模型
Decoupled Residual Denoising Diffusion Models for Unified and Data Efficient Image-to-Image Translation
May 31, 2026
作者: Ziyue Lin, Jiahe Hou, Hongyu Xia, Xinrui Xie, Feifei Wang, Yuyin Zhou, Wei Wang, Jiawei Liu, Liangqiong Qu
cs.AI
摘要
我们提出了解耦残差去噪扩散模型(DRDD),用于统一且数据高效的图像到图像(I2I)翻译。尽管扩散模型在质量和多样性方面推动了I2I翻译的进步,但我们发现了扩散模型中一个先前未被充分探索的特性。关键在于,除了其传统的流形提升作用(即将数据从低维流形中移出)之外,注入高斯噪声通过隐式对齐跨域的特征分布来促进域协调,这一特性对于统一的I2I翻译尤为有利。然而,现有的扩散模型过早地削弱了这一协调效应,因为噪声和残差在单个耦合的扩散过程中被同时移除。为了解决这个问题,DRDD将扩散过程解耦为两个顺序且独立的扩散阶段:(1)随机噪声扩散,用于域协调和流形提升;(2)确定性残差扩散,在固定噪声域内完全学习核心语义映射。这种解耦在整个转换过程中保留了协调和流形提升效应,极大地简化了跨不同任务和域的统一映射学习。值得注意的是,噪声扩散阶段仅在丰富、未配对的目標域图像上进行训练,大大提高了数据效率。全面的理论和实证分析表明,DRDD与主流扩散模型广泛兼容,并且即使在有限配对数据的情况下,也能持续提供稳健、统一的I2I翻译。我们的代码可在 https://github.com/HKU-HealthAI/DRDD 获取。
English
We propose Decoupled Residual Denoising Diffusion models (DRDD) for unified and data-efficient image-to-image (I2I) translation. While diffusion models have advanced I2I translation in terms of quality and diversity, we uncover a previously under-explored property in diffusion models. Crucially, beyond its conventional role of manifold lifting (i.e., moving data off low-dimensional manifolds), injecting Gaussian noise facilitates domain harmonization by implicitly aligning feature distributions across domains, a property particularly advantageous for unified I2I translation. However, existing diffusion models prematurely erode this harmonization effect, as noise and residuals are simultaneously removed in a single coupled diffusion process. To address this, DRDD decouples the diffusion process into two sequential and independent diffusion stages: (1) a stochastic noise diffusion for domain harmonization and manifold lifting, and (2) a deterministic residual diffusion that learns the core semantic mapping entirely within the fixed-noise domain. This decoupling preserves harmonization and manifold lifting effects throughout the transformation, substantially simplifying the learning of unified mappings across diverse tasks and domains. Notably, the noise diffusion stage is trained exclusively on abundant, unpaired target-domain images, greatly improving data efficiency. Comprehensive theoretical and empirical analysis demonstrates that DRDD is broadly compatible with mainstream diffusion models and consistently delivers robust, unified I2I translation, even under limited paired data. Our code is available at https://github.com/HKU-HealthAI/DRDD.