DreamRenderer:驾驭大规模文本到图像模型中的多实例属性控制
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models
March 17, 2025
作者: Dewei Zhou, Mingwei Li, Zongxin Yang, Yi Yang
cs.AI
摘要
基于图像条件的生成方法,如深度图和边缘检测图引导的技术,已展现出精确图像合成的卓越能力。然而,现有模型在准确控制多个实例(或区域)内容方面仍面临困难。即便是FLUX和3DIS等顶尖模型,也存在实例间属性泄露等问题,限制了用户对生成过程的掌控。为解决这些挑战,我们提出了DreamRenderer,一种无需额外训练、基于FLUX模型的创新方案。DreamRenderer允许用户通过边界框或遮罩精确控制每个实例的内容,同时确保整体视觉的和谐统一。我们引入了两项核心技术:1)**硬文本属性绑定的桥接图像令牌**,通过复制图像令牌作为桥梁,确保仅基于文本数据预训练的T5文本嵌入在联合注意力机制中为每个实例绑定正确的视觉属性;2)**关键层硬图像属性绑定**,通过对FLUX的分析,我们识别出负责实例属性渲染的关键层,仅在这些层应用硬图像属性绑定,而在其他层采用软绑定策略,以此在保证图像质量的同时实现精准控制。在COCO-POS和COCO-MIG基准测试上的评估显示,DreamRenderer相较于FLUX将图像成功率提升了17.7%,并使得GLIGEN和3DIS等布局到图像模型的性能最高提升了26.8%。项目页面:https://limuloo.github.io/DreamRenderer/。
English
Image-conditioned generation methods, such as depth- and canny-conditioned
approaches, have demonstrated remarkable abilities for precise image synthesis.
However, existing models still struggle to accurately control the content of
multiple instances (or regions). Even state-of-the-art models like FLUX and
3DIS face challenges, such as attribute leakage between instances, which limits
user control. To address these issues, we introduce DreamRenderer, a
training-free approach built upon the FLUX model. DreamRenderer enables users
to control the content of each instance via bounding boxes or masks, while
ensuring overall visual harmony. We propose two key innovations: 1) Bridge
Image Tokens for Hard Text Attribute Binding, which uses replicated image
tokens as bridge tokens to ensure that T5 text embeddings, pre-trained solely
on text data, bind the correct visual attributes for each instance during Joint
Attention; 2) Hard Image Attribute Binding applied only to vital layers.
Through our analysis of FLUX, we identify the critical layers responsible for
instance attribute rendering and apply Hard Image Attribute Binding only in
these layers, using soft binding in the others. This approach ensures precise
control while preserving image quality. Evaluations on the COCO-POS and
COCO-MIG benchmarks demonstrate that DreamRenderer improves the Image Success
Ratio by 17.7% over FLUX and enhances the performance of layout-to-image models
like GLIGEN and 3DIS by up to 26.8%. Project Page:
https://limuloo.github.io/DreamRenderer/.Summary
AI-Generated Summary