ChatPaper.aiChatPaper

扩散分类器理解组合性,但需满足特定条件

Diffusion Classifiers Understand Compositionality, but Conditions Apply

May 23, 2025
作者: Yujin Jeong, Arnas Uselis, Seong Joon Oh, Anna Rohrbach
cs.AI

摘要

理解视觉场景是人类智能的基础。虽然判别模型极大地推动了计算机视觉的发展,但它们通常在组合理解方面存在困难。相比之下,最近的生成式文本到图像扩散模型在合成复杂场景方面表现出色,暗示了其内在的组合能力。基于此,零样本扩散分类器被提出,旨在将扩散模型重新用于判别任务。尽管先前的研究在判别组合场景中展示了有前景的结果,但由于基准测试数量有限且对模型成功条件的分析相对浅显,这些结果仍处于初步阶段。为解决这一问题,我们对扩散分类器在广泛组合任务中的判别能力进行了全面研究。具体而言,我们的研究涵盖了三个扩散模型(SD 1.5、2.0,以及首次引入的3-m),跨越10个数据集和超过30项任务。此外,我们揭示了目标数据集领域在各自性能中的作用;为隔离领域效应,我们引入了一个新的诊断基准Self-Bench,该基准由扩散模型自身生成的图像构成。最后,我们探讨了时间步权重的重要性,并揭示了领域差距与时间步敏感性之间的关系,特别是对于SD3-m。总之,扩散分类器能够理解组合性,但需满足特定条件!代码和数据集可在https://github.com/eugene6923/Diffusion-Classifiers-Compositionality获取。
English
Understanding visual scenes is fundamental to human intelligence. While discriminative models have significantly advanced computer vision, they often struggle with compositional understanding. In contrast, recent generative text-to-image diffusion models excel at synthesizing complex scenes, suggesting inherent compositional capabilities. Building on this, zero-shot diffusion classifiers have been proposed to repurpose diffusion models for discriminative tasks. While prior work offered promising results in discriminative compositional scenarios, these results remain preliminary due to a small number of benchmarks and a relatively shallow analysis of conditions under which the models succeed. To address this, we present a comprehensive study of the discriminative capabilities of diffusion classifiers on a wide range of compositional tasks. Specifically, our study covers three diffusion models (SD 1.5, 2.0, and, for the first time, 3-m) spanning 10 datasets and over 30 tasks. Further, we shed light on the role that target dataset domains play in respective performance; to isolate the domain effects, we introduce a new diagnostic benchmark Self-Bench comprised of images created by diffusion models themselves. Finally, we explore the importance of timestep weighting and uncover a relationship between domain gap and timestep sensitivity, particularly for SD3-m. To sum up, diffusion classifiers understand compositionality, but conditions apply! Code and dataset are available at https://github.com/eugene6923/Diffusion-Classifiers-Compositionality.

Summary

AI-Generated Summary

PDF153May 26, 2025