IterComp:從模型庫進行的文本到圖像生成的迭代式組合感知反饋學習
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
October 9, 2024
作者: Xinchen Zhang, Ling Yang, Guohao Li, Yaqi Cai, Jiake Xie, Yong Tang, Yujiu Yang, Mengdi Wang, Bin Cui
cs.AI
摘要
像 RPG、穩定擴散 3 和 FLUX 這樣的先進擴散模型已在組合式文本到圖像生成方面取得顯著進展。然而,這些方法通常在組合生成方面表現出不同的優勢,有些擅長處理屬性綁定,而其他則擅長處理空間關係。這種差異凸顯了需要一種方法,能夠利用各種模型的互補優勢,全面提升組合能力。為此,我們引入了 IterComp,一個新穎的框架,它匯集了來自多個模型的組合感知模型偏好,並採用迭代反饋學習方法來增強組合生成。具體來說,我們精心挑選了六個功能強大的開源擴散模型,並評估它們的三個關鍵組合指標:屬性綁定、空間關係和非空間關係。基於這些指標,我們開發了一個組合感知模型偏好數據集,其中包含眾多圖像-排名對,用於訓練組合感知獎勵模型。然後,我們提出了一種迭代反饋學習方法,以閉環方式增強組合性,實現基於多次迭代的基礎擴散模型和獎勵模型的逐步自我完善。理論證明了效果,廣泛實驗顯示了我們在先前的 SOTA 方法(例如 Omost 和 FLUX)方面的顯著優越性,特別是在多類別對象組合和複雜語義對齊方面。IterComp 為擴散模型和組合生成開辟了新的研究途徑。代碼:https://github.com/YangLing0818/IterComp
English
Advanced diffusion models like RPG, Stable Diffusion 3 and FLUX have made
notable strides in compositional text-to-image generation. However, these
methods typically exhibit distinct strengths for compositional generation, with
some excelling in handling attribute binding and others in spatial
relationships. This disparity highlights the need for an approach that can
leverage the complementary strengths of various models to comprehensively
improve the composition capability. To this end, we introduce IterComp, a novel
framework that aggregates composition-aware model preferences from multiple
models and employs an iterative feedback learning approach to enhance
compositional generation. Specifically, we curate a gallery of six powerful
open-source diffusion models and evaluate their three key compositional
metrics: attribute binding, spatial relationships, and non-spatial
relationships. Based on these metrics, we develop a composition-aware model
preference dataset comprising numerous image-rank pairs to train
composition-aware reward models. Then, we propose an iterative feedback
learning method to enhance compositionality in a closed-loop manner, enabling
the progressive self-refinement of both the base diffusion model and reward
models over multiple iterations. Theoretical proof demonstrates the
effectiveness and extensive experiments show our significant superiority over
previous SOTA methods (e.g., Omost and FLUX), particularly in multi-category
object composition and complex semantic alignment. IterComp opens new research
avenues in reward feedback learning for diffusion models and compositional
generation. Code: https://github.com/YangLing0818/IterCompSummary
AI-Generated Summary