ChatPaper.aiChatPaper

MultiBooth:从文本生成图像中的所有概念

MultiBooth: Towards Generating All Your Concepts in an Image from Text

April 22, 2024
作者: Chenyang Zhu, Kai Li, Yue Ma, Chunming He, Li Xiu
cs.AI

摘要

本文介绍了MultiBooth,一种用于从文本生成图像中进行多概念定制的新颖高效技术。尽管定制生成方法取得了显著进展,特别是扩散模型取得成功,但现有方法在多概念场景中往往面临概念准确性低和推理成本高的问题。MultiBooth通过将多概念生成过程分为两个阶段来解决这些问题:单概念学习阶段和多概念整合阶段。在单概念学习阶段,我们采用多模态图像编码器和高效概念编码技术来学习每个概念的简洁且具有区分性的表示。在多概念整合阶段,我们使用边界框来定义交叉注意力图中每个概念的生成区域。这种方法使得能够在其指定区域内创建各个概念,从而促进了多概念图像的形成。这一策略不仅提高了概念准确性,还降低了额外的推理成本。MultiBooth在定性和定量评估中均超越了各种基线,展示了其卓越的性能和计算效率。项目页面:https://multibooth.github.io/
English
This paper introduces MultiBooth, a novel and efficient technique for multi-concept customization in image generation from text. Despite the significant advancements in customized generation methods, particularly with the success of diffusion models, existing methods often struggle with multi-concept scenarios due to low concept fidelity and high inference cost. MultiBooth addresses these issues by dividing the multi-concept generation process into two phases: a single-concept learning phase and a multi-concept integration phase. During the single-concept learning phase, we employ a multi-modal image encoder and an efficient concept encoding technique to learn a concise and discriminative representation for each concept. In the multi-concept integration phase, we use bounding boxes to define the generation area for each concept within the cross-attention map. This method enables the creation of individual concepts within their specified regions, thereby facilitating the formation of multi-concept images. This strategy not only improves concept fidelity but also reduces additional inference cost. MultiBooth surpasses various baselines in both qualitative and quantitative evaluations, showcasing its superior performance and computational efficiency. Project Page: https://multibooth.github.io/

Summary

AI-Generated Summary

PDF91December 15, 2024