扩散分类器理解组合性,但需满足特定条件
Diffusion Classifiers Understand Compositionality, but Conditions Apply
May 23, 2025
作者: Yujin Jeong, Arnas Uselis, Seong Joon Oh, Anna Rohrbach
cs.AI
摘要
理解视觉场景是人类智能的基础。虽然判别模型极大地推动了计算机视觉的发展,但它们通常在组合理解方面存在困难。相比之下,最近的生成式文本到图像扩散模型在合成复杂场景方面表现出色,暗示了其内在的组合能力。基于此,零样本扩散分类器被提出,旨在将扩散模型重新用于判别任务。尽管先前的研究在判别组合场景中展示了有前景的结果,但由于基准测试数量有限且对模型成功条件的分析相对浅显,这些结果仍处于初步阶段。为解决这一问题,我们对扩散分类器在广泛组合任务中的判别能力进行了全面研究。具体而言,我们的研究涵盖了三个扩散模型(SD 1.5、2.0,以及首次引入的3-m),跨越10个数据集和超过30项任务。此外,我们揭示了目标数据集领域在各自性能中的作用;为隔离领域效应,我们引入了一个新的诊断基准Self-Bench,该基准由扩散模型自身生成的图像构成。最后,我们探讨了时间步权重的重要性,并揭示了领域差距与时间步敏感性之间的关系,特别是对于SD3-m。总之,扩散分类器能够理解组合性,但需满足特定条件!代码和数据集可在https://github.com/eugene6923/Diffusion-Classifiers-Compositionality获取。
English
Understanding visual scenes is fundamental to human intelligence. While
discriminative models have significantly advanced computer vision, they often
struggle with compositional understanding. In contrast, recent generative
text-to-image diffusion models excel at synthesizing complex scenes, suggesting
inherent compositional capabilities. Building on this, zero-shot diffusion
classifiers have been proposed to repurpose diffusion models for discriminative
tasks. While prior work offered promising results in discriminative
compositional scenarios, these results remain preliminary due to a small number
of benchmarks and a relatively shallow analysis of conditions under which the
models succeed. To address this, we present a comprehensive study of the
discriminative capabilities of diffusion classifiers on a wide range of
compositional tasks. Specifically, our study covers three diffusion models (SD
1.5, 2.0, and, for the first time, 3-m) spanning 10 datasets and over 30 tasks.
Further, we shed light on the role that target dataset domains play in
respective performance; to isolate the domain effects, we introduce a new
diagnostic benchmark Self-Bench comprised of images created by diffusion models
themselves. Finally, we explore the importance of timestep weighting and
uncover a relationship between domain gap and timestep sensitivity,
particularly for SD3-m. To sum up, diffusion classifiers understand
compositionality, but conditions apply! Code and dataset are available at
https://github.com/eugene6923/Diffusion-Classifiers-Compositionality.Summary
AI-Generated Summary