擴散分類器理解組合性,但需滿足特定條件
Diffusion Classifiers Understand Compositionality, but Conditions Apply
May 23, 2025
作者: Yujin Jeong, Arnas Uselis, Seong Joon Oh, Anna Rohrbach
cs.AI
摘要
理解视觉场景是人类智能的基础。尽管判别模型在计算机视觉领域取得了显著进展,但它们往往在组合理解方面表现欠佳。相比之下,近期生成式文本到图像扩散模型在合成复杂场景方面表现出色,暗示了其内在的组合能力。基于此,零样本扩散分类器被提出,旨在将扩散模型重新应用于判别任务。虽然先前的工作在判别组合场景中展示了有前景的结果,但由于基准测试数量有限且对模型成功条件的分析相对浅显,这些结果仍属初步。为解决这一问题,我们开展了一项全面研究,探讨扩散分类器在广泛组合任务中的判别能力。具体而言,我们的研究涵盖了三个扩散模型(SD 1.5、2.0,以及首次引入的3-m),跨越10个数据集和超过30项任务。此外,我们揭示了目标数据集领域对各自性能的影响;为隔离领域效应,我们引入了一个新的诊断基准Self-Bench,该基准由扩散模型自身生成的图像构成。最后,我们探讨了时间步权重的重要性,并揭示了领域差距与时间步敏感性之间的关系,特别是对于SD3-m模型。总之,扩散分类器能够理解组合性,但需满足特定条件!代码和数据集可在https://github.com/eugene6923/Diffusion-Classifiers-Compositionality获取。
English
Understanding visual scenes is fundamental to human intelligence. While
discriminative models have significantly advanced computer vision, they often
struggle with compositional understanding. In contrast, recent generative
text-to-image diffusion models excel at synthesizing complex scenes, suggesting
inherent compositional capabilities. Building on this, zero-shot diffusion
classifiers have been proposed to repurpose diffusion models for discriminative
tasks. While prior work offered promising results in discriminative
compositional scenarios, these results remain preliminary due to a small number
of benchmarks and a relatively shallow analysis of conditions under which the
models succeed. To address this, we present a comprehensive study of the
discriminative capabilities of diffusion classifiers on a wide range of
compositional tasks. Specifically, our study covers three diffusion models (SD
1.5, 2.0, and, for the first time, 3-m) spanning 10 datasets and over 30 tasks.
Further, we shed light on the role that target dataset domains play in
respective performance; to isolate the domain effects, we introduce a new
diagnostic benchmark Self-Bench comprised of images created by diffusion models
themselves. Finally, we explore the importance of timestep weighting and
uncover a relationship between domain gap and timestep sensitivity,
particularly for SD3-m. To sum up, diffusion classifiers understand
compositionality, but conditions apply! Code and dataset are available at
https://github.com/eugene6923/Diffusion-Classifiers-Compositionality.Summary
AI-Generated Summary