IterComp：從模型庫進行的文本到圖像生成的迭代式組合感知反饋學習

摘要

像 RPG、穩定擴散 3 和 FLUX 這樣的先進擴散模型已在組合式文本到圖像生成方面取得顯著進展。然而，這些方法通常在組合生成方面表現出不同的優勢，有些擅長處理屬性綁定，而其他則擅長處理空間關係。這種差異凸顯了需要一種方法，能夠利用各種模型的互補優勢，全面提升組合能力。為此，我們引入了 IterComp，一個新穎的框架，它匯集了來自多個模型的組合感知模型偏好，並採用迭代反饋學習方法來增強組合生成。具體來說，我們精心挑選了六個功能強大的開源擴散模型，並評估它們的三個關鍵組合指標：屬性綁定、空間關係和非空間關係。基於這些指標，我們開發了一個組合感知模型偏好數據集，其中包含眾多圖像-排名對，用於訓練組合感知獎勵模型。然後，我們提出了一種迭代反饋學習方法，以閉環方式增強組合性，實現基於多次迭代的基礎擴散模型和獎勵模型的逐步自我完善。理論證明了效果，廣泛實驗顯示了我們在先前的 SOTA 方法（例如 Omost 和 FLUX）方面的顯著優越性，特別是在多類別對象組合和複雜語義對齊方面。IterComp 為擴散模型和組合生成開辟了新的研究途徑。代碼：https://github.com/YangLing0818/IterComp

English

Advanced diffusion models like RPG, Stable Diffusion 3 and FLUX have made notable strides in compositional text-to-image generation. However, these methods typically exhibit distinct strengths for compositional generation, with some excelling in handling attribute binding and others in spatial relationships. This disparity highlights the need for an approach that can leverage the complementary strengths of various models to comprehensively improve the composition capability. To this end, we introduce IterComp, a novel framework that aggregates composition-aware model preferences from multiple models and employs an iterative feedback learning approach to enhance compositional generation. Specifically, we curate a gallery of six powerful open-source diffusion models and evaluate their three key compositional metrics: attribute binding, spatial relationships, and non-spatial relationships. Based on these metrics, we develop a composition-aware model preference dataset comprising numerous image-rank pairs to train composition-aware reward models. Then, we propose an iterative feedback learning method to enhance compositionality in a closed-loop manner, enabling the progressive self-refinement of both the base diffusion model and reward models over multiple iterations. Theoretical proof demonstrates the effectiveness and extensive experiments show our significant superiority over previous SOTA methods (e.g., Omost and FLUX), particularly in multi-category object composition and complex semantic alignment. IterComp opens new research avenues in reward feedback learning for diffusion models and compositional generation. Code: https://github.com/YangLing0818/IterComp

IterComp：從模型庫進行的文本到圖像生成的迭代式組合感知反饋學習

IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation

摘要

Support