DreamRenderer:駕馭大規模文本到圖像模型中的多實例屬性控制
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models
March 17, 2025
作者: Dewei Zhou, Mingwei Li, Zongxin Yang, Yi Yang
cs.AI
摘要
基於圖像條件的生成方法,如深度圖和邊緣檢測圖條件化方法,已展現出精確圖像合成的卓越能力。然而,現有模型在精確控制多個實例(或區域)內容方面仍存在困難。即便是FLUX和3DIS等最先進的模型,也面臨著實例間屬性洩漏等挑戰,這限制了用戶的控制能力。為解決這些問題,我們引入了DreamRenderer,這是一種基於FLUX模型的無需訓練的方法。DreamRenderer允許用戶通過邊界框或遮罩控制每個實例的內容,同時確保整體視覺和諧。我們提出了兩項關鍵創新:1)用於硬文本屬性綁定的橋接圖像標記,該方法使用複製的圖像標記作為橋接標記,確保僅在文本數據上預訓練的T5文本嵌入在聯合注意力過程中為每個實例綁定正確的視覺屬性;2)僅應用於關鍵層的硬圖像屬性綁定。通過對FLUX的分析,我們識別出負責實例屬性渲染的關鍵層,並僅在這些層中應用硬圖像屬性綁定,而在其他層中使用軟綁定。這種方法在確保精確控制的同時,保持了圖像質量。在COCO-POS和COCO-MIG基準上的評估表明,DreamRenderer將圖像成功率比FLUX提高了17.7%,並將GLIGEN和3DIS等佈局到圖像模型的性能提升了高達26.8%。項目頁面:https://limuloo.github.io/DreamRenderer/。
English
Image-conditioned generation methods, such as depth- and canny-conditioned
approaches, have demonstrated remarkable abilities for precise image synthesis.
However, existing models still struggle to accurately control the content of
multiple instances (or regions). Even state-of-the-art models like FLUX and
3DIS face challenges, such as attribute leakage between instances, which limits
user control. To address these issues, we introduce DreamRenderer, a
training-free approach built upon the FLUX model. DreamRenderer enables users
to control the content of each instance via bounding boxes or masks, while
ensuring overall visual harmony. We propose two key innovations: 1) Bridge
Image Tokens for Hard Text Attribute Binding, which uses replicated image
tokens as bridge tokens to ensure that T5 text embeddings, pre-trained solely
on text data, bind the correct visual attributes for each instance during Joint
Attention; 2) Hard Image Attribute Binding applied only to vital layers.
Through our analysis of FLUX, we identify the critical layers responsible for
instance attribute rendering and apply Hard Image Attribute Binding only in
these layers, using soft binding in the others. This approach ensures precise
control while preserving image quality. Evaluations on the COCO-POS and
COCO-MIG benchmarks demonstrate that DreamRenderer improves the Image Success
Ratio by 17.7% over FLUX and enhances the performance of layout-to-image models
like GLIGEN and 3DIS by up to 26.8%. Project Page:
https://limuloo.github.io/DreamRenderer/.Summary
AI-Generated Summary