ChatPaper.aiChatPaper

联合图像-特征扩散中的协同演化表征

Coevolving Representations in Joint Image-Feature Diffusion

April 19, 2026
作者: Theodoros Kouzelis, Spyros Gidaris, Nikos Komodakis
cs.AI

摘要

联合图像-特征生成建模作为一种新兴的有效策略,通过将低级VAE潜变量与预训练视觉编码器提取的高级语义特征相耦合,显著提升了扩散模型的训练效果。然而,现有方法依赖于固定的表征空间——该空间独立于生成目标构建,并在训练过程中保持不变。我们认为指导扩散过程的表征空间本身应适应生成任务的需求。为此,我们提出协同进化表征扩散框架(CoReDi),通过让语义表征空间在训练期间随扩散模型共同学习轻量级线性投影来实现动态演化。虽然直接优化该投影会导致退化解,但我们发现通过结合梯度截断目标、归一化操作及防止特征坍塌的定向正则化,可以实现稳定的协同进化。这种设计使语义空间能够逐步专精于图像合成的需求,增强其与图像潜变量的互补性。我们将CoReDi应用于VAE潜空间扩散和像素空间扩散,证明自适应语义表征能提升两种设定下的生成建模性能。实验表明,相较于在固定表征空间中运行的联合扩散模型,CoReDi具有更快的收敛速度和更高的样本质量。
English
Joint image-feature generative modeling has recently emerged as an effective strategy for improving diffusion training by coupling low-level VAE latents with high-level semantic features extracted from pre-trained visual encoders. However, existing approaches rely on a fixed representation space, constructed independently of the generative objective and kept unchanged during training. We argue that the representation space guiding diffusion should itself adapt to the generative task. To this end, we propose Coevolving Representation Diffusion (CoReDi), a framework in which the semantic representation space evolves during training by learning a lightweight linear projection jointly with the diffusion model. While naively optimizing this projection leads to degenerate solutions, we show that stable coevolution can be achieved through a combination of stop-gradient targets, normalization, and targeted regularization that prevents feature collapse. This formulation enables the semantic space to progressively specialize to the needs of image synthesis, improving its complementarity with image latents. We apply CoReDi to both VAE latent diffusion and pixel-space diffusion, demonstrating that adaptive semantic representations improve generative modeling across both settings. Experiments show that CoReDi achieves faster convergence and higher sample quality compared to joint diffusion models operating in fixed representation spaces.
PDF21April 25, 2026