IterComp:从模型库中学习的迭代式组合感知反馈学习,用于文本到图像生成
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
October 9, 2024
作者: Xinchen Zhang, Ling Yang, Guohao Li, Yaqi Cai, Jiake Xie, Yong Tang, Yujiu Yang, Mengdi Wang, Bin Cui
cs.AI
摘要
先进的扩散模型,如RPG、稳定扩散3和FLUX,在组合文本到图像生成方面取得了显著进展。然而,这些方法通常在组合生成方面表现出不同的优势,有些擅长处理属性绑定,而另一些擅长处理空间关系。这种差异突显了需要一种方法,能够利用各种模型的互补优势,全面提高组合能力。为此,我们引入了IterComp,这是一个新颖的框架,它汇集了来自多个模型的具有组合意识的模型偏好,并采用迭代反馈学习方法来增强组合生成。具体而言,我们策划了一个包含六个强大的开源扩散模型的画廊,并评估它们的三个关键组合度量:属性绑定、空间关系和非空间关系。基于这些度量,我们开发了一个包含大量图像-排名对的具有组合意识的模型偏好数据集,用于训练组合意识奖励模型。然后,我们提出了一种迭代反馈学习方法,以闭环方式增强组合性,实现基于多次迭代的基础扩散模型和奖励模型的逐步自我完善。理论证明了其有效性,广泛实验显示了我们在以往最先进方法(如Omost和FLUX)方面的显著优势,特别是在多类别对象组合和复杂语义对齐方面。IterComp为扩散模型和组合生成中的奖励反馈学习开辟了新的研究途径。代码:https://github.com/YangLing0818/IterComp
English
Advanced diffusion models like RPG, Stable Diffusion 3 and FLUX have made
notable strides in compositional text-to-image generation. However, these
methods typically exhibit distinct strengths for compositional generation, with
some excelling in handling attribute binding and others in spatial
relationships. This disparity highlights the need for an approach that can
leverage the complementary strengths of various models to comprehensively
improve the composition capability. To this end, we introduce IterComp, a novel
framework that aggregates composition-aware model preferences from multiple
models and employs an iterative feedback learning approach to enhance
compositional generation. Specifically, we curate a gallery of six powerful
open-source diffusion models and evaluate their three key compositional
metrics: attribute binding, spatial relationships, and non-spatial
relationships. Based on these metrics, we develop a composition-aware model
preference dataset comprising numerous image-rank pairs to train
composition-aware reward models. Then, we propose an iterative feedback
learning method to enhance compositionality in a closed-loop manner, enabling
the progressive self-refinement of both the base diffusion model and reward
models over multiple iterations. Theoretical proof demonstrates the
effectiveness and extensive experiments show our significant superiority over
previous SOTA methods (e.g., Omost and FLUX), particularly in multi-category
object composition and complex semantic alignment. IterComp opens new research
avenues in reward feedback learning for diffusion models and compositional
generation. Code: https://github.com/YangLing0818/IterCompSummary
AI-Generated Summary